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Abstract 

This Chapter deals with theoretical developments in the subject of quantum 
information and quantum computation, and includes an overview of classical in- 
formation and some relevant quantum mechanics. The discussion covers tòpics in 
quantum communication, quantum cryptography, and quantum computation, and 
concludes by considering whether a perspective in terms of quantum information 
sheds new light on the conceptual problems of quantum mechanics. 
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1 Introduction 

The subject of quantum information has its roots in the debaté about conceptual issues 
in the foundations of quantum mechanics. 

The story really begins with the dispute between Einstein and Bohr about the in- 
terpretation of quantum states, in particular the interpretation of so-called 'entangled 
states,' which exhibit peculiar nonlocal statistical correlations for widely separated 
quantum systems. See, for example, |Bohr, 1949 p. 283] and Einstein's reply in the 
same volume | Schilpp, 1949 1 . Einstein took the position that quantum mechanics is 
simply an incomplete theory. On the basis of a certain restricted set of correlations for 
a pair of systems in a particular entangled state, Einstein, Podolsky, and Rosen (EPR) 
argued in a seminal paper | Einstein et ai, 1935 1 that the phenomenon of entanglement 
conflicts with certain bàsic realist principies of separability and locality that all physical 
theories should respect, unless we regard quantum states as incomplete descriptions. 

Bohr's view, which he termed ' complementari ty,' eventually became entrenched as 
the orthodox Copenhagen interpretation, a patchwork of reformulations by Heisenberg, 
Pauli, von Neumann, Dirac, Wheeler, and others. (For a discussion, see | Howard, 2004 1 
and Landsman, this vol., ch. 5.) As Pauli put it in correspondence with Max Born 



2 



|Born, 197T] p. 218], a 'detachedobserver' description of the sort provided by classical 
physics is precluded by the nature of quantum phenomena, and a quantum description 
of events is as complete as it can be (in principle). Any application of quantum theory 
requires a 'cut' between the observer and the observed, or the macroscopic measuring 
instrument and the measured system, so that the description is in a certain sense con- 
textual, where the relevant context is defined by the whole macroscopic experimental 
arrangement. So, for example, a 'position measurement context' provides information 
about position but excludes, in principle, the possibility of simultaneously obtaining 
momentum information, because there is no fact of the matter about momentum in this 
context: the momentum value is indeterminate. The Copenhagen interpretation con- 
flicts with Einstein's realism, his 'philosophical prejudice,' as Pauli characterized it in a 
letter to Born |Born, 1971 p. 221], that lies at the heart of the dispute between Einstein 
and Bohr about the significance of the transition from classical to quantum mechanics. 

The 1990's saw the development of a quantum theory of information, based on the 
realization that entanglement, rather than being a minor source of embarrassment for 
physics that need only concern philosophers, can actually be exploited as a nonclassical 
communication channel to perform information-processing tasks that would be impos- 
sible in a classical world. In a two-part commentary on the EPR paper, Schròdinger 
119351 p. 555] identified entanglement as 'the characteristic trait of quantum theory, 
the one that enforces its entire departure from classical lines of thought.' This has 
led to an explosive surge of research among physicists and computer scientists on the 
application of information-theoretic ideas to quantum computation (which exploits en- 
tanglement in the design of a quantum computer, so as to enable the efficient per- 
formance of certain computational tasks), to quantum communication (new forms of 
'entanglement-assisted' communication, such as quantum teleportation), and to quan- 
tum cryptography (the identification of cryptographic protocols that are guaranteed to 
be unconditionally secure against eavesdropping or cheating, by the laws of quantum 
mechanics, even if all parties have access to quantum computers). 

Some milestones: Bell's analysis 1 1964-1 turned the EPR argument on its head by 
showing that Einstein's assumptions of separability and locality, applicable in classi- 
cal physics and underlying the EPR incompleteness argument, are incompatible with 
certain quantum statistical correlations (not explicitly considered by EPR) of sepa- 
rated systems in EPR-type entangled states. Later experiments | Asp ect et ah, 1981] 
Aspect et ah, 1982 1 confirmed these nonclassical correlations in set-ups that excluded 
the possibility of any sort of physically plausible, non-superluminal, classical commu- 
nication between the separated systems. 

In the 1980s, various authors, e.g., Wiesner, Bennett, and Brassard |Wi esner, 1983| 
Ben nett and Brassard, 19841 |Bennett et ah, 1982| pointed out that one could exploit 
features of the measurement process in quantum mechanics to thwart the possibility of 
undetected eavesdropping in certain cryptographic procedures, specifically in key dis- 
tribution — a procedure where two parties, Alice and Bob, who initially share no infor- 
mation end up each holding a secret random key which can be used to send encrypted 
messages between them. No third party, Eve, can obtain any information about the 
Communications between Alice and Bob that led to the establishment of the key, with- 
out Alice and Bob becoming aware of Eve's interference, because Eve's measurements 
necessarily disturb the quantum states of the systems in the communication channel. 



3 



Bennett 11973 1 showed how to make a universal Turing machine reversible for any 
computation, a required step in the design of a quantum computer that evolves via uni- 
tary (and hence reversible) state transformations, and Benioff 1 1980| developed Hamil- 
tonian models for computer computers. Feynman 1 1982 1 considered the problem of 
efficiently simulating the evolution of physical systems using quantum resources (not- 
ing that the classical simulation of a quantum process would be exponentially costly), 
which involves the idea of a quantum computation, but it was Deutsch 1 1985Í 119891 
who characterized the essential features of a universal quantum computer and formu- 
lated the first genuinely quantum algorithm. 

Following Duetsch's work on quantum lògic gates and quantum networks, sev- 
eral quantum algorithms were proposed for performing computational tasks more effi- 
ciently than any known classical algorithm, or in some cases more efficiently than any 
classical algorithm. The most spectacular of these is Shor's algorithm 1 1994-, 1997 1 
for finding the two prime factors of a positive integer N = pq, which is exponentially 
faster than the best-known classical algorithm. Since prime factorization is the basis of 
the most widely used públic key encryption scheme (currently universally applied in 
Communications between banks and commercial transactions over the internet), Shor's 
result has enormous practical significance. 

In the following, I present an account of some of the theoretical developments in 
quantum information, quantum communication, quantum cryptography, and quantum 
computation. I conclude by considering whether a perspective in terms of quantum 
information suggests a new way of resolving the foundational problems of quantum 
mechanics that were the focus of the debaté between Einstein and Bohr. 

My discussion is heavily indebted to Michael Nielsen and Isaac Chuang's illumi- 
nating and comprehensive Quantum Computation and Quantum Information 1 2000 1, 
and to several insightful review articles: 'The Joy of Entanglement' by Sandu Popescu 
and Daniel Rohrlich |1998|, 'Quantum Information and its Properties' by Richard 
Jozsa 1 1998 1, and 'Quantum Computing' by Andrew Steane 1 1998 1. 

2 Classical Information 

2.1 Classical Information Compression and Shannon Entropy 

In this section, I review the bàsic elements of classical information theory. In §2.1, 
I introduce the notion of the Shannon entropy of an information source and the fun- 
damental idea of information compression in Shannon's source coding theorem (or 
noiseless channel coding theorem). In §2.2, I define some information-theoretic con- 
cepts relevant to Shannon's noisy channel coding theorem. 

The classical theory of information was initially developed to deal with certain 
qüestions in the communication of electrical signals. Shannon's ground-breaking paper 
'A Mathematical Theory of Communication' | Shannon, 1948 1 followed earlier work by 
people like Nyquist 1 1924 1 and Hartley 1 1928 1 in the 1920s. The bàsic problem was 
the representation of messages, selected from an ensemble generated by a stochastic 
process at the message source, in such a way as to ensure their efficient transmission 
over an electrical circuit such as a noisy telegraph wire. 
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A communication set-up involves a transmitter or source of information, a (pos- 
sibly noisy) channel, and a receiver. The source produces messages in the form of 
sequences of symbols from some alphabet, which Shannon represented mathemati- 
cally as sequences of vàlues of independent, identically distributed random variables. 
In later idealizations, the source is represented as stationary, in the sense (roughly) that 
the probability of any symbol (or n-tuple of symbols) appearing at any given position 
in a (very long) sequence, when that position is considered with respect to an ensemble 
of possible sequences, is the same for all positions in the sequence, and ergodic, in the 
sense that this 'ensemble average' probability is equal to the 'time average' probability, 
where the time average refers to the probability of a symbol (or n-tuple of symbols) in 
a given (very long) sequence. 

The fundamental question considered by Shannon was how to quantify the minimal 
physical resources required to store messages produced by a source, so that they could 
be communicated via a channel without loss and reconstructed by a receiver. Shannon's 
source coding theorem (or noiseless channel coding theorem) answers this question. 

To see the idea behind the theorem, consider a source that produces long sequences 
(messages) composed of symbols from a finite alphabet a\, a 2 , . . . , a^, where the in- 
dividual symbols are produced with probabilities pi,p 2 , ■ ■ ■ ,Pk- A given sequence of 
symbols is represented as a sequence of vàlues of independent, identically distributed, 

discrete random variables X 1; X 2 , A typical sequence of length n, for large n, will 

contain close to Piïi symbols dj, for i = 1, . . . , n. So the probability of a sufficiently 
long typical sequence (assuming independence) will be: 

p(x 1 ,x 2 ,...,x n ) =p(xi)p{x 2 )...p(x n ) ~p\ in pl 2n ■■■P P k n · (1) 

Taking the logarithm of both sides (conventionally, in information theory, to the base 
2) yields: 

logp(a;i, ...,x n ) rts n^2p l \ogp l := -nH(X) (2) 

i 

where H{X) := — Y^iPi logpi is the Shannon entropy of the source. 

We can think about information in Shannon's sense in various ways. We can take 
— logpi, a decreasing function of pi with a minimum value of when Pi = 1 for some 
i, as a measure of the information associated with identifying the symbol cu produced 
by an information source. Then H(X) = -^íK^SPí i s tne average information 
gain, or the expectation value of the information gain associated with ascertaining the 
value of the random variable X. Alternatively, we can think of the entropy as a measure 
of the amount of uncertainty about X before we ascertain its value. A source that 
produces one of two distinguishable symbols with equal probability, such as the toss 
of a fair coin, is said to have a Shannon entropy of 1 bit: ascertaining which symbol is 
produced, or reducing one's uncertainty about which symbol is produced, is associated 
with an amount of information equal to 1 bit. 1 If we already know which symbol will 
be produced (so the probabilities are 1 and 0), the entropy is 0: there is no uncertainty, 
and no information gain. 

'Note that the term 'bit' (for 'binary digit') is used to refer to the bàsic unit of classical information in 
terms of Shannon entropy, and to an elementary two-state classical system considered as representing the 
possible outputs of an elementary classical information source. 
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Since 

PÍx 1 ,...,x n ) = 2- nH W (3) 

for sufficiently long typical sequences, and the probability of all the typical n-length 
sequences is less than 1, it follows that there are at most 2 nHtyX ^ typical sequences. 
In fact, if the pi are not all equal, the typical sequences comprise an exponentially 
small set T (of equiprobable typical sequences) in the set of all sequences as n — ► oo, 
but since the probability that the source produces an atypical sequence tends to zero 
as n — ► oo, the set of typical sequences has probability close to 1. So each typical 
n-sequence could be encoded as a distinct binary number of nH{X) binary digits or 
bits before being sent through the channel to the receiver, where the original sequence 
could then be reconstructed by inverting the 1-1 encoding map. (The reconstruction 
would fail, with low probability, only for the rare atypical sequences, each of which 
could be encoded as, say, a string of O's.) 

Notice that if the probabilities pi are all equal (pi = 1/k for all i), then H(X) = 
\ogk, and if some pj = 1 (and so pi = for i ^ j), then H(X) = (taking 
log = ]im x _»o x log x = 0). It can easily be shown that: 

< H{X) < logjfc. (4) 

If we encoded each of the k distinct symbols as a distinct binary number, i.e., as a 
distinct string of O's and l 's, we would need binary numbers composed of log k bits 
to represent each symbol (2 log k — k). So Shannon's analysis shows that messages 
produced by a stochastic source can be compressed, in the sense that (as n — ► oo and 
the probability of an atypical rt-length sequence tends to zero) Ti-length sequences can 
be encoded without loss of information using nH(X) bits rather than the n log k bits 
required if we encoded each of the k symbols as a distinct string of O's and l's: this 
is a compression, since nH(X) < n log k except for equiprobable distributions. 

More precisely, let X = —(Xi + X-i + . . . + X n ), where X%, X2, ■ ■ ■ , X n are n 
independent and identically distributed random variables with mean < X > and finite 
variance. The weak law of large numbers telis us that, for any e, 5 > 0, 

Pr(|X- < X > | > 5) < e (5) 

for sufficiently large n. 

Now consider a random variable X that takes vàlues x in an alphabet X with prob- 
abilities p(x) = Pr(X = x), x e X. 2 Let 

Z=-\ogp(X) (6) 

be a function of X that takes the value — \ogp(x) when X takes the value x. Then 

<Z>=-J2p^)^gp(x)=H(X) (7) 
xex 

2 Note that p(x) is an abbreviation for px(i), so p(x) and p(y) refer to two different random variables. 
The expression Pr(X E S) = SigsP( x ) denotes the probability that the random variable X takes a 
value in the set S, and Pr(X = x) denotes the probability that X takes the value x. The expression 
p(x\, X2, • ■ ■ , Xn) denotes the probability that the sequence of random variables X\, X2, . . . , X„ takes 
the sequence of vàlues (xi, X2, ■ ■ ■ , x n ). The discussion here follows Cover and Thomas 119911 and I use 
their notation. 
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and for a sequence of n independent and identically distributed random variables 
Xi, X2, • • • , X n : 

-±logp(X u ...,X n ) = -l^logppQ) 

i 

= -(Z 1 + ... + Z n ) 

n 

= Z. (8) 
So, by the weak law of large numbers, for e, 5 > and sufficiently large n: 

Pv(\Z- < Z > | > 5) < e (9) 

i.e., 



Pr(| - 1 logp(X u ...,X n )- H(X)\ >5)<e (10) 



or equivalently, 



Pr(| - 1 logppfi, . . . ,X n ) - H{X)\ <S)>l-e (11) 
and hence, with probability greater than or equal to 1 — e: 

- n(H(X) + S)< logppfi, . . . , X n ) < -n(H(X) - 5). (12) 

A 'ó-typical n-length sequence' (x\, . . . , x n ) e X n of vàlues of the random vari- 
ables X\ , . . . , X„ is defined as a sequence of symbols of X satisfying: 

2 -n(H(X)+S) <p{ Xl ,..., Xn )<2-^ H ^- 5 \ (13) 

5-typical n-length sequences 
in Tg' 1 ' by |T£"·'|. Then, for sufficiently large n, 



Denote the set of (5-typical ri-length sequences by and the number of sequences 



Pr({X 1 ,...,X„}e7f ) )>l-e; (14) 

and it can be shown that 

(1 - e )2< H ^ x )- s ) < |T (n) | < 2 n{H{x)+s) . (15) 

So, roughly, contains 2 nH equiprobable sequences, each having a probability of 

2-' nH . 

Shannon's source coding theorem applies the above result about typical sequences 
to show that the compression rate of H(X) bits per symbol produced by a source of in- 
dependent and identically distributed random variables is optimal. The source produces 
n-length sequences of symbols x\, X2, ■ ■ ■ , x n with probability p(x\, X2, ■ ■ ■ , x n ) = 
p(xi)p(x2) ■ ■ .p(x n ), where each symbol is chosen from an alphabet X. If there are 
k symbols in X, these n-sequences can be represented as sequences of n log k bits. 
Suppose there is a 'block coding' compression scheme that encodes each 'block' or 
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n-length sequence (for sufficiently large n) as a shorter sequence of nR bits, where 
< R < logfc. Suppose also that the receiver has a decompression scheme for 
decoding sequences of nR bits into sequences of n symbols. Then one speaks of a 
compression/decompression scheme of rate R. 
The source coding theorem states that 

if the Shannon entropy of a source is H(X), then there exists a reliable 
compression/decompression scheme of rate R if and only if R > H(X), 
where a scheme is said to be reliable if it reproduces the original sequence 
with a probability that tends to 1 as n — > oo. 

For reliable communication, we want the compression and decompression of a se- 
quence of symbols to yield the original sequence, but in general there will be a certain 
probability, q(x\, . . . , x n ), of decoding a given sequence of nR encoded bits received 
by the receiver as the original n-sequence produced by the source. The aveiagefidelity 3 
of a compression/decompression scheme for n-length blocks is defined as: 

F n = 22 p(x 1 ,...,x n )q(xi,...,x n ) (16) 
all n-sequences 

If all the probabilities q(x\, . . . ,x n ) are 1, F n = 1; otherwise F n < 1. In terms of 
the fidelity as a measure of reliability of correct decoding, the source coding theorem 
states that 

for any e, S > 0: (i) there exists a compression/decompression scheme 
using H(X) + S bits per symbol for n-length sequences produced by the 
source that can be decompressed by the receiver with a fidelity F n > 1 — e, 
for sufficiently large n, and (ii) any compression/decompression scheme 
using H(X) — 5 bits per symbol for n-length sequences will have a fidelity 
F n < e, for sufficiently large n. 

As a simple example of compression, consider an information source that produces 
sequences of symbols from a 4-symbol alphabet ai, a,2, 03, 04 with probabilities 1/2, 
1/4, 1/8, 1/8. Each symbol can be represented by a distinct 2-digit binary number: 



ai 


00 




01 


03 


10 


04 


11 



so without compression we need two bits per symbol of storage space to store the 
output of the source. The Shannon entropy of the source is H{X) = — ^ log i — 
i log i — i log i — g log g = |. Shannon's source coding theorem telis us that there 
is a compression scheme that uses an average of 7/4 bits per symbol rather than two 

3 Note that this definition of fidelity is different from the definition proposed by Nielsen and Chuang 
2000 p.400] for the fidelity between two probability distributions {p x } and {q x } as a 'distance measure' 
between the distributions. They define F NC (Px,q x ) ■= J2 X VP^I^' so F Nc{Px,q x ) = 1 ifpx = Qx- 
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bits per symbol, and that such a compression scheme is optimal. The optimal scheme 
is provided by the following encoding: 



dl 





ai 


10 


a 3 


110 


0,4 


111 



for which the average length of a compressed sequence is: i·l + | : ·2 + i·3 + |·3 = \ 
bits per symbol. 

The significance of Shannon's source coding theorem lies is showing that there is 
an optimal or most efficient way of compressing messages produced by a source (as- 
suming a certain idealization) in such a way that they can be reliably reconstructed 
by a receiver. Since a message is abstracted as a sequence of distinguishable symbols 
produced by a stochastic source, the only relevant feature of a message with respect 
to reliable compression and decompression is the sequence of probabilities associated 
with the individual symbols: the nature of the physical systems embodying the repre- 
sentation of the message through their states is irrelevant to this notion of compression 
(provided only that the states are reliably distinguishable), as is the content or meaning 
of the message. The Shannon entropy H(X) is a measure of the minimal physical 
resources, in terms of the average number of bits per symbol, that are necessary and 
sufficient to reliably store the output of a source of messages. In this sense, it is a 
measure of the amount of information per symbol produced by an information source. 

The essential notion underlying Shannon's measure of information is compressibil- 
ity: information as a physical resource is something that can be compressed, and the 
amount of information produced by an information source is measured by its optimal 
compressibility. 

2.2 Conditional Entropy, Mutual Information, Channel Capacity 

The analysis so far assumes a noiseless channel between the source and the receiver. I 
turn now to a brief sketch of some concepts relevant to a noisy channel, and a statement 
of Shannon's noisy channel coding theorem. 

An information channel maps inputs consisting of vàlues of a random variable X 
onto outputs consisting of vàlues of a random variable Y, and the map will generally 
not be 1-1 if the channel is noisy. Consider the conditional probabilities p(y\x) of 
obtaining an output value y for a given input value x, for all x, y. From the probabilities 
p(x) we can calculate p(y) as: 

p(y) = ^p(y\ x )p( x ) 

X 

and we can also calculate p(x\y) by Bayes' rule from the probabilities p(y\x) andp(x), 
for all x, y, and hence the Shannon entropy of the conditional distribution p(x\y), for 
all x and a fixed y, denoted by H(X\Y = y). 
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The quantity 

H(X\Y) = J2p(y)H(X\Y = y) (17) 

v 

is known as the conditional entropy. It is the expected value of H(X\Y = y) for all y. 
If we think of H(X), the entropy of the distribution {p(x) : x e X}, as a measure of 
the uncertainty of the X-value, then H(X\Y = y) is a measure of the uncertainty of 
the X-value, given the F-value y, and H(X\Y) is a measure of the average uncertainty 
of the X -value, given a Y- value. 

Putting it differently, the number of input sequences of length n that are consistent 
with a given output sequence (as n — ► oo) is 2 nH< < x \ Y \ i.e., ií(X|y) is the number of 
bits per symbol of additional information needed, on average, to identify an input X- 
sequence from a given F-sequence. This follows because there are 2 nH ( X ' Y ) typical 
sequences of pairs (x, y), where the joint entropy H(X, Y) is calculated from the joint 
probability p(x, y). So there are 

nnH(X,Y) 

= 2 n ( H ( x > Y )- H ( Y )) = 2 nH ( x \ Y ) (m 

2nH(Y) • ' 

typical X-sequences associated with a given F-sequence. 
The 'chain rule' equality 

H(X, Y) = H(X) + H(Y\X) = H(Y) + H(X\Y) = H(Y, X) (19) 

follows immediately from the logarithmic definitions of the quantities: 

H{X:Y) := -^p{x,y)\ogp{x,y) 



^2p(x)p(y\x) log (p(x)p(y\x)) 

^2p(x)p(y\x) \ogp(x) - ^2p(x)p(y\x) log p(y\x) 



x,y x,y 

= -^2p(x)logp(x) +^2p(x) l -^2 p(y\x) log p(y\x) J 

x x \ y / 

= H(X) + H(Y\X) (20) 

Note that H{X\Y) ^ H{Y\X). 

The mutual information measures the average amount of information gained about 
X by ascertaining a F-value, i.e., the amount of information one random variable 
contains about another, or the reduction in uncertainty of one random variable obtained 
by measuring another. 

Mutual information can be defined in terms of the concept of relative entropy, 
which is a measure of something like the distance between two probability distribu- 
tions (although it is not a true mètric, since it is not symmetric and does not satisfy 
the triangle inequality). The relative entropy between distributions p(x) and q(x) is 
defined as: 

/ \ i . p( x ) 
q(x) ' 



D{p\\q) = y j p{x)log^. (21) 
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The mutual information can now be defined as: 



H(X:Y) = D(p(x,y)\\p(x)p(y)) 

- ÇEK^m. 

It follows that 

H(X:Y) = H(X) - H(X\Y) = H(Y) - H(Y\X), (23) 

i.e., the mutual information of two random variables represents the average informa- 
tion gain about one random variable obtained by measuring the other: the difference 
between the initial uncertainty of one of the random variables, and the average residual 
uncertainty of that random variable after ascertaining the value of the other random 
variable. Also, since H (X, Y) = H (X) + H(Y\X), it follows that 

H(X:Y) = H(X) + H(Y)-H(X,Y); (24) 

i.e., the mutual information of two random variables is a measure of how much infor- 
mation they have in common: the sum of the information content of the two random 
variables, as measured by the Shannon entropy (in which joint information is counted 
twice), minus their joint information. Note that H(X : X) = H(X), as we would 
expect. 

For a noisy channel, if X represents the input to the channel and Y represents the 
output of the channel, H(X : Y) represents the average amount of information gained 
about the input X by ascertaining the value of the output Y . The capacity of a channel, 
C, is defined as the supremum of H (X : Y) over all input distribu tions. 

Shannon's noisy channel coding theorem shows, perhaps surprisingly, that up to C 
bits of information can be sent through a noisy channel with arbitrary low error rate. 
That is, 

there exists an optimal coding for an information source with entropy H < 
C such that n-length sequences produced by the source can be transmitted 
faithfully over the channel: the error rate tends to zero as n — > oo. The 
probability of error tends to 1 if we attempt to transmit more than C bits 
through the channel. 

This means that there are two ways of improving the transmission rate over a noisy 
channel such as a telephone cable. We can improve the channel capacity by replacing 
the cable with a faster one, or we can improve the information processing (the data 
compression). 



3 Quantum Information 

The physical notion of information, discussed in fj2] is profoundly transformed by the 
transition from classical mechanics to quantum mechanics. The aim of this section 
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is to bring out the nature of this transformation. In §3.1, I develop some core con- 
cepts of quantum mechanics relevant to quantum information: entangled states, the 
Schmidt decomposition, the density operator formalism for the representation of pure 
and mixed states, the 'purification' of mixed states, generalized quantum measurements 
in terms of positive operator valued measures (POVMs), and the evolution of open sys- 
tems represented by quantum operations. I assume throughout Hilbert spaces of finite 
dimension (and so avoid all the technicalities of functional analysis required for the 
treatment of infinite-dimensional Hilbert spaces). In fact, there is no loss of generality 
here, since both classical and quantum information sources are considered to produce 
messages consisting of sequences of symbols from some finite alphabet, which we rep- 
resent in terms of a finite set of classical or quantum states. Moreover, all the concep- 
tual issues relevant to the difference between classical and quantum information show 
up in finite-dimensional Hilbert spaces. In §3.2, I introduce von Neumann's general- 
ization of the Shannon entropy and related notions for quantum information. In §3.3 
and §3.4, I discuss some salient features that distinguish quantum information from 
classical information: §3.3 deals with the limitations on copying quantum information 
imposed by the 'no cloning' theorem, and §3.4 deals with the limited accessibility of 
quantum information defined by the Holevo bound. Finally, in §3.5 I show how the 
notion of compressibility applies to quantum information, and I outline Schumacher's 
generalization of Shannon's source coding theorem for quantum information, noting a 
distinction between 'visible' and 'blind' compression applicable to quantum informa- 
tion. 

3.1 Some Relevant Quantum Mechanics 
3.1.1 Entangled States 

Consider a quantum system Q which is part of a compound system QE; E for 'envi- 
ronment,' although E could be any quantum system of which Q is a subsystem. Pure 
states of QE are represented as rays or unit vectors in a tensor product Hilbert space 
TiP ® H E . A general pure state of QE is a state of the form: 



where <E 1i9 is a complete set of orthonormal states (a basis) in 1i9 and \ ej) £ TL E 
is a basis in TÍ E . If the coefficients Cjj are such that |ï r ) cannot be expressed as a 
product state \Q)\E), then \ ^) is called an entangled state. 

For any state |\I>) of QE, there exist orthonormal bases \i) e H Q , \j) G H E such 
that I*) can be expressed in a biorthogonal correlated form as: 



where the coefficients ^/pï are real and non-negative, and J^Pi = !■ This representa- 
tion is referred to as the Schmidt decomposition. The Schmidt decomposition is unique 
if and only if the are all distinct. 




(25) 




(26) 
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An example is the biorthogonal EPR state 

|*) = (|Q)|l)-|l)|Q))/>/2; (27) 

say, the singlet state of two spin-1/2 partides (the Schmidt form with positive coeffi- 
cients is obtained by asborbing the relative phases in the definition of the basis vectors). 
In the singlet state, |0) and |1) can be taken as representing the two eigenstates of spin 
in the z-direction, but since the state is symmetric, |$) retains the same form for spin 
in any direction. The EPR argument exploits the fact that spin measurements in the 
same direction on the two partides, which could be arbitrarily far apart, will yield out- 
comes that are perfectly anti-correlated for any spin direction. Bell's counterargument 
exploits the fact that when the spin is measured on one particle in a direction 9i to the 
z-axis, but on the other particle in a direction 62 to the z-axis, the probability of finding 
the same outcome for both partides (both 1 or both 0) is | sin 2 (#1 — 6%) ■ It follows that 
the outcomes are perfectly correlated when 6\ — 62 = tt and that 3/4 of the outcomes 
are the same when d\ — 62 — 2ir/3. On the other hand, from Bell's inequality, derived 
under Einstein's realist assumptions of separability and locality, we see that the corre- 
lation for 9\ — 62 = 2tt/3 cannot exceed 2/3. See Dickson (this vol., ch. 4) for further 
discussion. 

This means that the dynamical evolution of a quantum system can result in a state 
representing correlational information that no classical computer can simulate. That is, 
no classical computer can be programmed to perform the following task: for any pair 
of input angles, 61,82, at different locations, output a pair of vàlues (0 or 1) for these 
locations such that the vàlues are perfectly correlated when 6\ — 62 = tt, perfectly 
anti-correlated when 6\ = 62, and 75% correlated when 9\ — 62 = 2ir/3, where the 
response time between being given the input and producing the output in each case is 
less than the time taken by light to travel between the two locations. 

Notice that the four states: 



100) = 


>}|o ) + 


|i>|i» 


(28) 


101} = 


> >ll>+ 


|1>|0» 


(29) 


110} - 


> >l0> - 


ll>ll» 


(30) 


111} = 


71 (|0>|1> - 


|1>|0» 


(31) 



form an orthonormal basis, called the Bell basis, in the 2 x 2-dimensional Hilbert space. 
Any Bell state can be transformed into any other Bell state by a local unitary transfor- 
mation, X, Y , or Z, where X, Y, Z are the Pauli spin matrices: 

X = ^ = |0)(1| + |1)(0|= ( J J ) (32) 

4 Einstein, Podolsky and Rosen considered a more complicated state entangled over position and momen- 
tum vàlues. The spin example is due to Bohm 1 1951 pp, 611-623]. 
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For example: 



Y ^ = /|u)(li- /|1><0|=( ° * ] .33) 

z = n,= i)",o! | :l > < i I = ( l _J ) (34) 



X® J~(|0)<1| - |1)|0) = -^(|0)(0| - |1>|1>. (35) 



If QE is a closed system in an entangled pure state represented by 

\*) =Y,VPi\i)\i) (36) 

i 

in the Schmidt decomposition, the expected value of any Q-observable A on Ti. can 
be computed as: 

(A) = Tr(\9)(9\A®I) 

= Tr Q (Tr B (|*)<*|A)) 

= Tr Q (Y^Pi\i)(i\A) 

i 

= Trç (pA) (37) 

whereTrçO = J2 q (li\ ' |<3i), for any orthonormal basis in T-fi , is the partial trace over 

H , Tr^O is the partial trace over H E , and p = X^PíNK^I G 7^ ls tne reduced 
density operator of the open system Q, a positive operator with unit trace. Since the 
density operator p yields the statistics of all Q-observables via Eq. d37i . p is taken as 
representing the quantum state of the system Q. 

If QE is an entangled pure state, then the open system Q is in a mixed state p, i.e., 
p 7^ p 2 ; for pure states, p is a projection operator onto a ray and p = p 2 . A mixed 
state represented by a density operator p = ^ Pi\ï){i\ can be regarded as a mixture 
of pure states \i) prepared with prior probabilities pi, but this representation is not 
unique — not even if the states combined in the mixture are orthogonal. For example, 
the equal-weight mixture of orthonormal states |0) , 1 1) in a 2-dimensional Hilbert space 
Ti.2 has precisely the same statistical properties, and hence the same density operator 
p = 1/2, as the equal weight mixture of any pair of orthonormal states, e.g., the 
states -^t|(|0) + |1)), ^(10) — |1)), or the equal-weight mixture of nonorthogonal 

states |0), i |0) + ||0) - 120° degrees apart, or the uniform continuous 

distribution over all possible states in 7í 2 - 

More generally, for any basis of orthonormal states |e<) € 7í E , the entangled state 
l't) can be expressed as: 
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where the normalized states \rj) — J^i ~^t\Qí) are relative states to the \ej) i^/w] = 
J2j l c y | 2 )- N° te that the states \rj) are not in general orthogonal. Since the je^) are 
orthogonal, we can express the density operator representing the state of Q as: 

p = Y J w i \r i ){r i \. (39) 

i 

In effect, a measurement of an _E-observable with eigenstates | e^) will leave the 
composite system QE in one of the states 1^^) |e^) with probability Wi, and a mea- 
surement of an i?-observable with eigenstates \í) (the orthogonal states of the Schmidt 
decomposition in (I36i above) will leave the system QE in one of the states \i) \i) with 
probability pi. Since Q and E could be widely separated from each other in space, 
no measurement at E could affect the statistics of any Q-observable; or else measure- 
ments at E would allow superluminal signaling between Q and E. It follows that the 
mixed state p can be realized as a mixture of orthogonal states |i) (the eigenstates of p) 
with weights pi, or as a mixture of non-orthogonal relative states \rj) with weights wj 
in infinitely many ways, depending on the choice of basis in 7í E : 

p=Y,Pi\i}{i\ = J2 w ^( r i\ < 4 °) 

i 3 

and all these different mixtures with the same density operator p must be physically 
indistinguishable. 

Note that any mixed state density operator p £ Tv* can be 'purified' by adding 
a suitable ancilla system E, in the sense that p is the partial trace of a pure state 
over 7i , A purification of a mixed state is, clearly, not unique, 
but depends on the choice of in TL E . The Hughston-Jozsa-Wootters theorem 
IHugh ston et al, 1993| shows that for any mixture of pure states \ri) with weights 
Wi, where p = w j\ r j)( r j\< there is a purification of p and a suitable measurement 
on the system E that will leave Q in the mixture p. So an observer at E can remotely 
prepare Q in any mixture that corresponds to the density operator p (and of course all 
these different mixtures are physically indistinguishable). Similar results were proved 
earlier by Schròdinger llT936l . Jaynes IIT9571 and Gisin IÏ9891 . See Halvorson ll2004l 
for a generalization to hyperfinite von Neuman algebras. 

3.1.2 Measurement 

A Standard von Neumann 'yes-no' measurement is associated with a projection oper- 
ator; so a Standard observable is represented in the spectral representation as a sum of 
projection operators, with coefficients representing the eigenvalues of the observable. 
Such a measurement is the quantum analogue of the measurement of a property of a 
system in classical physics. Classically, we think of a property of a system as being 
associated with a subset in the state space (phase space) of the system, and determining 
whether the system has the property amounts to determining whether the state of the 
system lies in the corresponding subset. In quantum mechanics, the counterpart of a 
subset in phase space is a closed linear subspace in Hilbert space. Just as the different 
possible vàlues of an observable (dynamical quantity) of a classical system correspond 
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to the subsets in a mutually exclusive and collectively exhaustive set of subsets covering 
the classical state space, so the different vàlues of a quantum observable correspond to 
the subspaces in a mutually exclusive (i.e., orthogonal) and collectively exhaustive set 
of subspaces spanning the quantum state space. (For further discussion, see Dickson, 
this vol., ch. 4, [Mac key, 1963) , and |Bub, 1997| .) 

In quantum mechanics, and especially in the theory of quantum information (where 
any read-out of the quantum information encoded in a quantum state requires a quan- 
tum measurement), it is useful to consider a more general class of measurements than 
the projective measurements associated with the determination of the value of an ob- 
servable. It is common to speak of generalized measurements and generalized observ- 
ables. But in fact this terminology is more misleading than illuminating, because a 
generalized measurement is not a procedure that reveals whether or not a quantum sys- 
tem has some sort of generalized property. Rather, the point of the generalization is to 
exploit the difference between quantum and classical states for new possibilities in the 
representation and manipulation of information. 

To clarify the idea, I will follow the excel·lent discussion by Nielsen and Chuang 
120001 §2.2.3-2.2.6]. A quantum measurement can be characterized, completely gen- 
erally, as a certain sort of interaction between two quantum systems, Q (the measured 
system) and M (the measuring system). We suppose that Q is initially in a state \ip) 
and that M is initially in some Standard state |0), where \m) is an orthonormal basis of 
'pointer' eigenstates in TÍ M . The interaction is defined by a unitary transformation U 
on the Hilbert space TiP ® 7í M that yields the transition: 

mO)-^Y,M m \i>)\m) (41) 

m 

where {M m } is a set of linear operators (the Kraus operators) defined on Ti9 satisfying 
the completeness condition: 

Y J MlM m = I. (42) 

m 

(The symbol f denotes the adjoint or Hermitian conjugate.) The completeness con- 
dition guarantees that this evolution is unitary, because it guarantees that U preserves 
inner products, i.e. 

(<f>\{0\rfU\4>)\0) = J2( m \W M ™ M m'\ï>)\m') 

m,m' 
m 

= W) (43) 

from which it follows that U, defined as above by Eq. J41i for any product state 
\ip)\0) (for any 

IV») G W Q ) 

can be extended to a unitary operator on the Hilbert space 
T-fi ®7í M . Accordingly, any set of linear operators {M m } defined on the Hilbert space 
of the system Q satisfying the completeness condition defines a measurement in this 
general sense, with the index m labeling the possible outcomes of the measurement, 
and any such set is referred to as a set of measurement operators. 
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If we now perform a Standard projective measurement on M to determine the value 
m of the pointer observable, defined by the projection operator 

P m = Iq ®\m)(m\ 

then the probability of obtaining the outcome m is, by i37i 5 : 

p(m) = (0\{i,\U^P m U\^)\0) 

= ( m 'M M l(lQ ® \m)(m\)M m ,,\il>)\m") 

m'm" 

= J2 &\Ml,{m'\rn){m\m'')M m ,,\ip) 

m'm" 

= ^\MlM m \iP); (44) 
and, more generally, if the initial state of Q is a mixed state p, then 

p(m) = TiQ(MpM^). (45) 
The final state of QM after the projective measurement on M yielding the outcome m 

P m U\iP)\0) M m \if>)\m) 



So the final state of M is \m) and the final state of Q is: 

M m \ib) 



(46) 



\^\MlM m \^) 

and, more generally, if the initial state of Q is a mixed state p, then the final state of Q 
is: 

M mP Ml 
Tr Q (M mP Ml)' 

Note that this general notion of measurement covers the case of Standard projective 
measurements. In this case {M m } — {P m }, where {P m } is the set of projection 
operators defined by the spectral measure of a Standard quantum observable represented 
by a self-adjoint operator. It also covers the measurement of 'generalized observables' 
associated with positive operator valued measures (POVMs). Let 

E m = MlM m (47) 

then the set {E m } defines a set of positive operators ('effects') such that 

E ™ = 1 ( 48 > 

5 The expected value of a projection operator, which is an idempotent observable with eigenvalues and 
1 , is equal to the probability of obtaining the eigenvalue 1 . Here the eigenvalue 1 corresponds to the outcome 
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A POVM can be regarded as a generalization of a projection valued measure (PVM), 
in the sense that Eq. (1481 defines a 'resolution of the identity' without requiring the 
PVM orthogonality condition: 

PraPrn' — &mrn' Em- (49) 

Note that for a POVM: 

p(m) = (t/>\E m \il>). (50) 

Given a set of positive operators {E m } such that E m = I, measurement opera- 

tors M m can be defined via 

M m = U^EZ, (51) 

where U is a unitary operator, from which it follows that 

^2Ml l M m = J2 E m=I (52) 

m 

As a special case, of course, we can take U = 1 and M m — \J E rn . Conversely, 
given a set of measurement operators {M m }, there exist unitary operators U m such 
that M m = U m \/E m , where {E m } is a POVM. (This follows immediately from 
INielsen and Chuang, 2000 Theorem 2.3, p. 78]; see |Nielsen and Chuang, 2000 Ex- 
ercise 2.63, p. 92].) 

Except for the Standard case of projective measurements, one might wonder why 
it might be useful to single out such unitary transformations, and why in the general 
case such a process should be called a measurement of Q. The following example, 
taken from [ Nielsen and Chuang, 2000| p. 92], is illuminating. Suppose we know that 
a system with a 2-dimensional Hilbert space is in one of two nonorthogonal states: 

l^i) = |0> 

m = t^ + i 1 » 

It is impossible to reliably distinguish these states by a quantum measurement, even in 
the above generalized sense. Here 'reliably' means that the state is identified correctly 
with zero probability of error. 

To see this, suppose there is such a measurement, defined by two measurement 
operators M% , M% satisfying the completeness condition. Then we require 

p(l) = (^ilMjMilVi) = 1, (53) 
to represent reliability if the state is \tpt)\ and 

p(2) = (MMlM 2 \ih) = 1 (54) 
to represent reliability if the state is | V^)- By the completeness condition we must have 

(Vi \M\ Mi + M%M 2 \ipi) = 1 (55) 
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from which it follows that (^i|M^M 2 |r/'i) = 0, i.e., M 2 |^i) = M 2 |0) = 0. Hence 

M 2 |^ 2 ) = M 2 -L(|0) + |1)) = ^=M 2 |1) (56) 



and so 



p(2) = (^ 2 |M|M 2 |V 2 ) = \{l\MlM 2 \l). (57) 



But by the completeness condition we also have 

(l\MlM 2 \l) < {l\M\Mi + M\M 2 \l) = (1|1) = 1 (58) 
from which it follows that 

P(2) < ^ (59) 

which contradicts Eq. J54I . 

However, it is possible to perform a measurement in the generalized sense, with 
three possible outcomes, that will allow us to con·ectly identify the state some of the 
time, i.e., for two of the possible outcomes, while nothing about the identity of the state 
can be inferred from the third outcome. 

Here's how: The three operators 



V2 (|0)-|1))((0|-(1|) 
1 + V2 2 



E 3 = I-E x -E 2 (60) 

are all positive operators and E\ + E 2 + E 3 = I, so they define a POVM. In fact, 
Ei,E 2 , £3 are each múltiples of projection operators onto the states 

|01> = IH 1 - 
1^2) = IVl)^ 

'*»> = (1 + ^ )|0) + |1) (61) 
2^2(1 + y/2) 

with coefficients 1+V2 ' 1+V2 res P ect i ve ly- The measurement involves a system 

M with three orthogonal pointer states |1), \ 2), |3). The appropriate unitary interaction 
U results in the transition, for an input state \ip): 

|V>|0> ^^M m |^)|m) (62) 

m 

where M m = \JE m . 

If the input state is \^>\) = |0), we have the transition: 



|Vi)|0) s/E x \tí)\l) +V^ 3 |0)|3) 

= a|0i)|l)+/3|0 3 )|3) (63) 
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(because VE 2 \ipi) = VE 2 \0) = 0). And if the input state is \tp 2 ) = 75 (|0) 
we have the transition: 



1^2) |0) 



^ |2) + 

7 |fc)|2) + *|&)|3) 



|0) 



1 



V2 



|3) 



(64) 



= VE 



V2 



0), where a, f3, 7, S are real numerical coeffi- 



(because \/Ei\tp 2 ) 
cients. 

We see that a projective measurement of the pointer of M that yields the outcome 
m = 1 indicates, with certainty, that the input state was \ip±) = |0). In this case, the 
measurement leaves the system Q in the state \cj)i). A measurement outcome m = 2 
indicates, with certainty, that the input state was \ip 2 ) = ~7f (|0) + and in this case 
the measurement leaves the system Q in the state \4> 2 ). If the outcome is m = 3, the 
input state could have been either \tpi) = |0) or \ip%) = -^|(|0) + |1)), and Q is left in 
the state | çS> 3 ) . 



3.1.3 Quantum Operations 

When a closed system QE evolves under a unitary transformation, Q can be shown to 
evolve under a quantum operation, i.e., a completely positive linear map: 

£ : p -> p' (65) 

where 

E(p) = Tr E (Up®pEtf) (66) 

(See |Nielsen and Chuang, 2000| p. 356 ff].) The map £ is linear (or convex-linear) 
in the sense that £Ç%2iPiPi) = TIíPí^ÍPí)^ positive in the sense that £ maps positive 
operators to positive operators, and completely positive in the sense that £ ® I is a 
positive map on the extension of Tfi to a Hilbert space Tt® <8> 7í E , associated with the 
addition of any ancilla system E to Q. 

Every quantum operation (i.e., completely positive linear map) on a Hilbert space 
has a (non-unique) representation as a unitary evolution on an extended Hilbert 
space H Q ® H E , i.e., 

£(p)=Tr E (U(ft®p E )UÍ) (67) 

where is an appropriately chosen initial state of an ancilla system E (which we can 
think of as the environment of Q). It turns out that it suffices to take p E as a pure state, 
i.e., |0)(0|, since a mixed state of E can always be purified by enlarging the Hilbert 
space (i.e., adding a further ancilla system). So the evolution of a system Q described 
by a quantum operation can always be modeled as the unitary evolution of a system 
QE, for an initial pure state of E. 

Also, every quantum operation on a Hilbert space Tl® has a (non-unique) operator 
sum representation intrinsic to 7í . 

£(p) = J2 E iP E í (68) 
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where Ei = (i\U\0) for some orthonormal basis of E. (See |Nielsen and Chuang, 2000 
Theorem 8.1, p. 368].) If the operation is trace-preserving (or nonselective), then 

E\Eí = I. For operations that are not trace-preserving (or selective), J2< EjEi < 
I. This corresponds to the case where the outcome of a measurement on QE is taken 
into account (selected) in the transition £ — > £(p). 

If there is no interaction between Q and E, then e(p) — UqpUq, UqIIq = I, i.e., 
there is only one operator in the sum. In this case, U = Uq ® Ue and 

£(p) = Tr E {U Q ®U E {p®\0)(0\)Ul®Ul) (69) 
= UqpU^q. (70) 

So unitary evolution is a special case of the operator sum representation of a quantum 
operation and, of course, another special case is the transition £ — > £(p) that occurs 
in a quantum measurement process, where Ei — Mi. A trace-preserving operation 
corresponds to a non-selective measurement: 

£{ P ) =Y,M iP Ml\ (71) 

i 

while an operation that is not trace-preserving corresponds to a selective measurement, 
where the state 'collapses' onto the corresponding measurement outcome: 

Mi pM\ I Tr ( Mi p M\ ) . (72) 

The operator sum representation applies to quantum operations between possibly 
different input and output Hilbert spaces, and characterizes the following general sit- 
uation: a quantum system in an unknown initial state p is allowed to interact unitarily 
with other systems prepared in Standard states, after which some part of the composite 
system is discarded, leaving the final system in a state p'. The transition p — ► p' is 
defined by a quantum operation. So a quantum operation represents, quite generally, 
the unitary evolution of a closed quantum system, the nonunitary evolution of an open 
quantum system in interaction with its environment, and evolutions that result from a 
combination of unitary interactions and selective or nonselective measurements. 

As we have seen, the creed of the Church of the Larger Hilbert Space is that every 
state can be made pure, every measurement can be made ideal, and every evolution can 
be made unitary - on a larger Hilbert space. 6 

3.2 Von Neumann Entropy 

In this section, I define the von Neumann entropy of a mixture of quantum states (von 
Neumann's generalization of the Shannon entropy of a classical probability distribu- 
tion characterizing a classical information source) and the corresponding notions of 
conditional entropy and mutual information. 

Information in Shannon's sense is a quantifiable resource associated with the out- 
put of a (suitably idealized) stochastic source of symbolic states, where the physical 

6 The Creed originates with John Smolin. I owe this formulation to Ben Schumacher. See his Lecture 
Notes on Quantum Information Theory 1 1998 . 
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nature of the systems embodying these states is irrelevant to the amount of classical 
information associated with the source. The quantity of information associated with 
a stochastic source is defined by its optimal compressibility, and this is given by the 
Shannon entropy. The fact that some feature of the output of a stochastic source can 
be optimally compressed is, ultimately, what justifies the attribution of a quantifiable 
resource to the source. 

Information is represented physically in the states of physical systems. The es- 
sential difference between classical and quantum information arises because of the 
different distinguishability properties of classical and quantum states. As we will see 
below, only sets of orthogonal quantum states are reliably distinguishable (i.e., with 
zero probability of error), as are sets of different classical states (which are represented 
by disjoint singleton subsets in a phase space, and so are orthogonal as subsets of phase 
space in a sense analogous to orthogonal subspaces of a Hilbert space). 

Classical information is that sort of information represented in a set of distinguish- 
able states — states of classical systems, or orthogonal quantum states — and so can be 
regarded as a subcategory of quantum information, where the states may or may not be 
distinguishable. The idea behind quantum information is to extend Shannon's notion 
of compressibility to a stochastic source of quantum states, which may or may not be 
distinguishable. For this we need to define a suitable measure of information for prob- 
ability distributions of quantum states — mixtures — as a generalization of the notion of 
Shannon entropy. 

Consider a system QE in an entangled state \^f). Then the subsystem Q is in a 
mixed state p, which can always be expressed as: 

p = Y J Pi\ï){i\ (73) 

i 

where the pi are the eigenvalues of p and the pure states |i) are orthonormal eigenstates 
of p. This is the spectral representation of p, and any density operator — a positive 
(hence Hermitian) operator — can be expressed in this way. The representation is unique 
if and only if the pi are all distinct. If some of the pi are equal, there is a unique 
representation of p as a sum of projection operators with the distinct vàlues of the pi as 
coefficients, but some of the projection operators will project onto multi-dimensional 
subspaces. 

Since p has unit trace, Pi — 1> an d so tne spectral representation of p represents a 
classical probability distribution of orthogonal, and hence distinguishable, pure states. 
If we measure a Q-observable with eigenstates \í), then the outcomes can be associated 
with the vàlues of a random variable X, where Pr(X = i) = pi. Then 

H(X) = -^Pilogpi (74) 

is the Shannon entropy of the probability distribution of measurement outcomes. 
Now, 

- Tr(p log p) = -^2 Pi \ogp t (75) 

(because the eigenvalues of p log p are pi log pi and the trace of an operator is the sum 
of the eigenvalues), so a natural generalization of Shannon entropy for any mixture of 
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quantum states with density operator p is the von Neumann entropy 1 : 

5:=-Tr(plo g/9 ) (76) 

which coincides with the Shannon entropy for measurements in the eigenbasis of p. 
For a completely mixed state p = I/d, where dim7í^ = d, the d eigenvalues of p are 
all equal to 1/d and S = log d. This is the maximum value of S in a <i-dimensional 
Hilbert space. The von Neumann entropy S is zero, the minimum value, if and only if 
p is a pure state, where the eigenvalues of p are 1 and 0. So < S < log d, where d is 
the dimension of 7í . 

Recali that we can think of the Shannon entropy as a measure of the average amount 
of information gained by identifying the state produced by a known stochastic source. 
Alternatively, the Shannon entropy represents the optimal compressibility of the infor- 
mation produced by an information source. The von Neumann entropy does not, in 
general, represent the amount of information gained by identifying the quantum state 
produced by a stochastic source characterized as a mixed state, because nonorthog- 
onal quantum states in a mixture cannot be reliably identified. However, as we will 
see in £13 .51 the von Neumann entropy can be interpreted in terms of compressibil- 
ity via Schumacher's source coding theorem Schumacher's source coding theorem for 
quantum information ISchumacher, 1995 1, a generalization of Shannon's source cod- 
ing theorem for classical information. For an elementary two-state quantum system 
with a 2-dimensional Hilbert space considered as representing the output of an ele- 
mentary quantum information source, S — 1 for an equal weight distribution over two 
orthogonal states (i.e., for the density operator p = 1/2), so Schumacher takes the bà- 
sic unit of quantum information as the 'qubit.' By analogy with the term 'bit,' the term 
'qubit' refers to the bàsic unit of quantum information in terms of the von Neumann 
entropy, and to an elementary two-state quantum system considered as representing the 
possible outputs of an elementary quantum information source. 

The difference between quantum information as measured by von Neumann en- 
tropy S and classical information as measured by Shannon entropy H can be brought 
out by considering the quantum notions of conditional entropy and mutual informa- 
tion (cf. ^2.21 . and in particular the peculiar feature of inaccessibility associated with 
quantum information. 

For a composite system AB, conditional von Neumann entropy and mutual in- 
formation are defined in terms of the joint entropy S{AB) = — Ti(p AB \ogp AB ) by 
analogy with the corresponding notions for Shannon entropy (cf. Eqs. H9\ , ( I23> . (I24> ): 

S(A\B) = S{A,B)-S{B) (77) 

S(A:B) = S{A)-S{A\B) (78) 

= S(B) - S(B\A) (79) 

= S(A) + S{B) - S(A,B) (80) 

The joint entropy satisfies the subadditivity inequality: 

S(A,B) < S(A) + S{B) (81) 

7 Von Neumann first defined this quantity on the basis of a thermodynamic argument in 1 1955 p. 379]. 
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with equality if and only if A and B are uncorrelated, i.e., p AB = p A ® p B . 

Now, 5(A|B) can be negative, while the conditional Shannon entropy is always 
positive or zero. Consider, for example, the entangled state |\&) = ( 1 00) + |ll))/\/2- 
Since |*) is apure state, S(A,B) = 0. But S(A) = S{B) = 1. So S(A\B) = 
S(A,B) — S(A) = — 1. In fact, for a pure state \^>) of a composite system AB, 
S(A\B) < if and only if |*) is entangled. 

For a composite system AB in a product state p ® a, it follows from the definition 
of joint entropy that: 

S(A, B) = S(p ® <t) = + S(<j) = S(A) + 5(5). (82) 

If is in a pure state \^), it follows from the Schmidt decomposition theorem that 
\^f) can be expressed as 

l*>=£V£l*><i| (83) 

i 

from which it follows that 

PA = TMMM) = Eili)<í 



PB 

and so: 



Tta(IV)^I) = EJiXíh (84) 

S(A) = S(B) = -J^PifogPi- (85) 



Consider a mixed state prepared as a mixture of states pi with weights pi. It can be 
shown that 

SQ2 p lPl ) < H(pi) + Ç (86) 

with equality if and only if the states pi have support on orthogonal subspaces (see 
INielsen and Chuang, 2000 Theorem 11.10, p. 518]). The entropy H(pi) is referred 
to as the entropy of preparation of the mixture p. 

If the states pi are pure states, then S(p) < H(pi). For example, suppose is 
2-dimensional and pi = P2 = 1/2, then H(pi ) = 1. So if we had a classical infor- 
mation source producing the symbols 1 and 2 with equal probabilities, no compression 
of the information would be possible. However, if the symbols 1 and 2 are encoded as 
nonorthogonal quantum states \n) and \r2), then S(p) < 1. As we will see in ^13.51 
according to Schumacher's source coding theorem, since S(p) < 1, quantum compres- 
sion is possible, i.e., we can transmit long sequences of qubits reliably using S < 1 
qubits per quantum state produced by the source. 

Note that if AB is prepared in a mixture of states pi ® \i) (i\ with weights pi, where 
the pi are any density operators, not necessarily orthogonal, then it follows from (1861 . 
1521 . and the fact that S(\i) = Q that 

SQ2píPí®\í){í\) = H(pi)+J2Pi S (Pi® l*)<*l) 

i 1 

= H(pi)+J2piS(Pi)- (87) 
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The von Neumann entropy of a mixture of states pi with weights pi, ^PiPi, is a 
concave function of the states in the distribution, i.e., 



SC^PiPi) > ^PiSfa). (88) 



To see this, consider a composite system AB in the state 



We have 



p AB =Y,PiPi®\ j )(i\- (89) 



S(A) = SÇ^piPi) (90) 

i 

S(B) = S£>|t)<*|) = H{jpí) (91) 



and 



S(A,B) = Hipi) +^2p i S(p i ) (92) 

i 

by equation By subadditivity S(A) + S(B) > S{A, B), so: 

S(J2píPí) > Y,PiS(Pi). (93) 

i i 

It turns out that projective measurements always increase entropy, i.e., if p' = 
'^2 i PipPi, then S(p') > S(p), but generalized measurements can decrease entropy. 
Consider, for example, the generalized measurement on a qubit in the initial state p 
defined by the measurement operators M% — |0)(0| and M2 = |0)(1|. (Note that 
these operators do define a generalized measurement because AíJa/i + M\M2 — 
|0)(0| + |1)(1| = /■) Afterthe measurement 

p' = |0)(0|p|0>(0| + |0)(l|p|l>{0| 
= Tr(p)|l)(l| 

= |1><1|. (94) 

So S(p') = < S{p). 

3.3 The 'No Cloning' Theorem 



In ii B.ll we saw that two nonorthogonal quantum states cannot be reliably distinguished 
by any measurement. A 'no cloning ' theorem establishes that nonorthogonal quantum 
states cannot be copied. To see this, suppose there were a device D that could copy any 
input quantum state of a system Q with states in . Suppose the initial ready state 
of the device D is |0> S H D . Then we require, for any orthonormal set of input states 

\i)\0) (95) 



25 



where U is the unitary transformation that implements the copying process. By linear- 
ity, it then follows that for any input state J^. Cj|i): 

Çcil^^-^Çcil»)!») (96) 
But for copying we require that: 

i i i 

and 

i i i ij 

unless CiCj — Sij, which means that the device could not copy any states that are not 
in the orthonormal set \í). 

Alternatively, one might note that if two states \tp) and \<f>) could be copied, then 

|^)|0) |V) |V>) (99) 

\<t>)\0) (100) 

Since unitary transformations preserve inner produces, we require that 

= (V#>(4#> (ioi) 

which is possible if and only if if(^|çíi) = 1 or 0. That is: for cloning to be possible, 
either the states are identical, or they are orthogonal. 

The 'no cloning' theorem was proved independently by Dieks 1 1982 1 and Wootters 
and Zurek 1 1982 1 . An important extension of this result to mixtures is due to Barnum, 
Caves, Fuchs, Jozsa, and Schumacher 1 1996a|. In a cloning process, a ready state a 
of a system B and the state to be cloned p of a system A are transformed into two 
copies of p. In a more general broadcasting process, a ready state u and the state to be 
broadcast p are transformed to a new state u of AB, where the marginal state u> with 
respect to both A and B is p, i.e., 

PA = Tr B (Lü)=p (102) 

p B =Tr A {u) = p (103) 

The 'no cloning' theorem states that a set of pure states can be cloned if and only if the 
states are mutually orthogonal. The 'no broadcasting ' theorem states that an arbitrary 
set of states can be broadcast if and only if they are represented by mutually commuting 
density operators. Classically, since all pure states are, in a formal sense, orthogonal 
and all operators (representing real-valued functions on phase space) commute, both 
cloning and broadcasting are possible. Note that broadcasting reduces to cloning for 
pure states. 

Of course, it is always possible to build a special-purpose device to clone a given 
(known) quantum state \ip), because this would simply be a device that prepares the 
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state \tp). The 'no cloning' theorem, from another perspective, is just the statement 
of the quantum measurement problem (see fJT}: measurements, in the classical sense 
of reproducing in a second system a copy of the state of the first system (or, more 
generally, a 'pointer state' that represents the state of the first system), are impossible 
in quantum mechanics, except for measurements restricted to orthogonal sets of input 
states. 

A modification of the argument leading to Eqs. d99t — (11011 shows that no informa- 
tion gain about the identity of nonorthogonal states is possible without disturbing the 
states. Suppose the device D acts as a measuring device that records some information 
about the identity of the input state, i.e., the output state of the device is different for 
different input states \if>), \<j>); and that the device does not disturb the input states. Then 

|^)|0) (104) 

\m 10)10') dos) 

from which it follows that 

(m = (v#xw> d06) 

and so 

WW) = 1 (107) 

since (ip\4>) 7^ if \ip) and \cf>} are nonorthogonal. In other words, if there is no dis- 
turbance to the nonorthogonal input states, there can be no information gain about the 
identity of the states. So, for example, an eavesdropper, Eve, could gain no information 
about the identity of nonorthogonal quantum states communicated between Alice and 
Bob without disturbing the states, which means that passive eavesdropping is impossi- 
ble for quantum information. 

The observation that a set of pure states can be cloned if and only if they are mutu- 
ally orthogonal is equivalent to the observation that a set of pure states can be reliably 
distinguished if and only if they are mutually orthogonal. For if we could distinguish a 
pair of states \ip) and \<j>), then we could copy them by simply preparing the states with 
special-purpose preparation devices for and And if we could copy the states, 
then we could prepare as many copies as we liked of each state. Because the product 
states |?/>)® n and |0)® n become orthogonal in the limit as n — > oo, these states are 
certainly distinguishable, and so the possibility of cloning the states \ip) and \(j>) would 
provide a means of distinguishing them. 

Note also that, by a similar argument, cloning would allow different mixtures as- 
sociated with the same density operator to be distinguished. The equal-weight mix- 
ture of qubit states | T z ) = |0), | | z ) = |1) (the eigenstates of the spin observable 
Z = cr z ) has the same density operator, 1/2, as the equal-weight mixture of states 
I tx) = 7f(|0) + | D - ^-(|0) - |1» (the eigenstates of X = a x ). Since the 
cloned states | J \ x ) t ^ n , \ [ x )® n become distinguishable from the cloned states | tz)®™> 
| | z )® n , cloning would allow the two mixtures to be distinguished. 

This possibility would also allow superluminal signalling. For suppose Alice and 
Bob shared the entangled state -i=(|0)|l) — |1)|0)). If Alice measured X or Z on her 

qubit, she would steer Bob's qubit into the mixture -|| |x)(tx I + \ \ Ix) (ta I or the 
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mixture 4 1 f z ) (f 2 | + 5 1 J,*) (U | - If Bob could distinguish these mixtures by cloning, in 
a shorter time than the time taken for light to travel between Alice and Bob, he would 
be able to ascertain whether Alice measured X or Z, so 1 bit of information would be 
transferred from Alice to Bob superluminally. 

3.4 Accessible Information 

The ability to exploit quantum states to perform new sorts of information-processing 
tasks arises because quantum states have different distinguishability properties than 
classical states. Of course, it is not the mere lack of distinguishability of quantum states 
that is relevant here, but the different sort of distinguishability enjoyed by quantum 
states. This indistinguishability is refiected in the limited accessibility of quantum 
information. 

To get a precise handle on this notion of accessibility, consider a classical infor- 
mation source in Shannon's sense, with Shannon entropy H(X). Suppose the source 
produces symbols represented as the vàlues x (in an alphabet X) of a random vari- 
able X, with probabilities p x , and that the symbols are encoded as quantum states p x , 
x € X. The mutual information H(X : Y) (as defined by Eqs. 122\ . (I23> . (124» is a 
measure of how much information one gains, on average, about the value of the ran- 
dom variable X on the basis of the outcome Y of a measurement on a given quantum 
state. The accessible information is defined as: 

Sup H(X:Y) (108) 

over all possible measurements. 

The Holevo bound on mutual information provides an important upper bound to 
accessible information: 

H{X:Y) <S(p)-J2PxS(Px) (109) 

X 

where p = J2 X PxPx and the measurement outcome Y is obtained from a measurement 
defined by a POVM {E y }. Since S(p) - T, X P^ S (P^) < H ( x ) b y E q- with 
equality if and only if the states p x have orthogonal support, we have: 

H(X:Y)<H(X) (110) 

Note that X can be distinguished from Y if and only if H(X : Y) = H(X). If 
the states p x are orthogonal pure states, then in principle there exists a measurement 
that will distinguish the states, and for such a measurement H(X : Y) — H(X). In 
this case, the accessible information is the same as the entropy of preparation of the 
quantum states, H(X). But if the states are nonorthogonal, then H(X : Y) < H(X) 
and there is no measurement, even in the generalized sense, that will enable the reliable 
identification of X. 

Note, in particular, that if the vàlues of X are encoded as the pure states of a qubit, 
then H(X : Y) < S(p) and S(p) < 1. It follows that at most 1 bit of informa- 
tion can be extracted from a qubit by measurement. If X has k equiprobable vàlues, 
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H(X) = logfc. Alice could encode these k vàlues into a qubit by preparing it in an 
equal-weight mixture of k nonorthogonal pure states, but Bob could only extract at 
most 1 bit of information about the value of X. For an n-state quantum system associ- 
ated with an n-dimensional Hilbert space, S(p) < logn. So even though Alice could 
encode any amount of information into such an n-state quantum system (by preparing 
the state as a mixture of nonorthogonal states), the most information that Bob could 
extract from the state by measurement is log n, which is the same as the maximum 
amount of information that could be encoded into and extracted from an n-state clas- 
sical system. It might seem, then, that the inaccessibility of quantum information as 
quantified by the Holevo bound would thwart any attempt to exploit quantum informa- 
tion to perform nonclassical information-processing tasks. In the following sections, 
we shall see that this is not the case: surprisingly, the inaccessibility of quantum in- 
formation can actually be exploited in information-processing tasks that transcend the 
scope of classical information. 

For an insightful derivation of the Holevo bound (essentially reproduced below), 
see | Nielsen and Chuang, 2000 Theorem 12.1, p. 531]. The bàsic idea is the fol- 
lowing: Suppose Alice encodes the distinguishable symbols of a classical information 
source with entropy H(X) as quantum states p x (not necessarily orthogonal). That is, 
Alice has a quantum system P, the preparation device, with an orthonormal pointer 
basis \x) corresponding to the vàlues of the random variable X, which are produced 
by the source with probabilities p x . The preparation interaction correlates the pointer 
states |a;) with the states p x of a quantum system Q, so that the final state of P and Q 
after the preparation interaction is: 

P PQ =5Zp*|a;)<aj|(8p s . (111) 

X 

Alice sends the system Q to Bob, who attempts to determine the value of the random 
variable X by measuring the state of Q. The initial state of P, Q, and Bob's measuring 
instrument M is: 

p PQM = ^p B |a;)(a;|®p a ®|0)<0| (112) 

X 

where |0) (0| is the initial ready state of M. Bob's measurement can be described by a 
quantum operation £ on the Hilbert space H (&TÍ M that stores a value of y, associated 
with a POVM {E y } on 7í , in the pointer state \y) of M, i.e., £ is defined for any state 
a S H Q and initial ready state |0) e TL M by: 

a®\o){o\-^J2V^ a 'M®\y)(y\- (H3) 

v 

We have (recali the definition of quantum mutual information in Eqs. ( I78t -( l80t ): 

S{P:Q)=S{P:Q,Mf (114) 

8 The notation S(P : Q, M) refers to the mutual information between the system P and the composite 
system consisting of the system Q and the measuring device M, in the initial state II 1 II . That is, the 
comma notation refers to the joint system (cf. Eqs. [78), IguV ): S(P : Q, M) = S(P) - S(P\Q, M) = 
S(P) + S(Q,M)-S(P,Q,M) 
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because M is initially uncorrelated with PQ and 

S(P':Q',M')<S(P:Q,M) (115) 

because it can be shown (|Ni elsen and Chuang, 200Ò"| Theorem 11.15, p. 522]) that 
quantum operations never increase mutual information (primes here indicate states af- 
ter the application of £). Finally: 

S{P':Q',M') (116) 

because discarding systems never increases mutual information (|Nielsen and Chuang, 2000 
Theorem 11.15, p. 522]), and so: 

S{P':M')<S{P:Q) (117) 

which (following some algebraic manipulation) is the statement of the Holevo bound, 
i.e., Mill reduces to dlQ9b . 

To see this, note (from dl 1 11 ) that 

p PQ =Y,p*\ x )( x \®p* < 118 > 

X 

So S(P) = H{ Px) ),S(Q) = S(J2 xPxPx ) = S( P ) and, by G3, 

S(P,Q) = H{p x) +J2p X S(p x ) (H9) 

X 

since the states \x)(x\ (8 p x have support on orthogonal subspaces in TÍ P Tir". It 
follows that 

S{P:Q) = S(P) + S{Q)- S(P,Q) 

= S( P ) -J2pxS(Px) (120) 

X 

which is the right hand side of the Holevo bound. 
For the left hand side: 

p p ' M ' = Tr Q ,{p P 'Q' M ') (121) 

= ^Q'CZ,Px\x){x\®^/E~ y p x ^E' y ®\y){ y \) (122) 

xy 

= J2PxTr(E y p x E y )\x)(x\®\y)(y\ (123) 

xy 

= J2p(x,y)\x)(x\®\y){y\, (124) 

xy 

since p(x,y) = p x p(y \ x) = p x r TT(p x E y ) = p x Ti{^ r E y 'p x y/E y ), and so S(P' : 
M') = H(X : Y). 
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The Holevo bound limits the representation of classical bits by qubits. Putting it 
another way, the Holevo bound characterizes the resource cost of encoding classical 
bits as qubits: one qubit is necessary and sufficient. Can we represent qubits by bits? If 
so, what is the cost of a qubit in terms of bits? This question is answered by the follow- 
ing result |Barn umef al, 200T1 : A quantum source of nonorthogonal signal states can 
be compressed with arbitarily high fidelity to a qubits per signal plus any number of 
classical bits per signal if and only if a is at least as large as the von Neumann entropy 
S of the source. This means that a genèric quantum source cannot be separated into a 
classical and quantum part: quantum information cannot be traded for any amount of 
classical information. 

3.5 Quantum Information Compression 

As pointed out in ÍI3.2I Shannon's source coding theorem (noiseless channel coding 
theorem) and the core notion of a typical sequence can be generalized for quantum 
sources. This was first shown by Jozsa and Schumacher |1994| and Schumacher 
ÍT9951 . See also |Barnum et al. , 1996b| . 

For a classical information bit source, where the output of the source is given by a 
random variable X with two possible vàlues xi, X2 with probabilities pi , P2, the Shan- 
non entropy of the information produced by the source is H(X) = H(pi,p2). So 
by Shannon's source coding theorem the information can be compressed and commu- 
nicated to a receiver with arbitrarily low probability of error by using H(X) bits per 
signal, which is less than one bit if p\ ^ P2- 

Now suppose the source produces qubit states ^2) with probabilities pi,p2- 
The Shannon entropy of the mixture p = í>i|^>i)(^>i| + ï^lV^KV^I is S(p). Schu- 
macher's generalization of Shannon's source coding theorem shows that the quantum 
information encoded in the mixture p can be compressed and communicated to a re- 
ceiver with arbitrarily low probability of error by using S(p) qubits per signal, and 
S(p) < 1 if the qubit states are nonorthogonal. 

Note that the signals considered here are qubit states. What Schumacher's theorem 
shows is that we can reliably communicate the sequence of qubit states produced by 
the source by sending less than one qubit per signal. Note also that since S(p) < 
H(pi,p2) if the qubit states are nonorthogonal, the quantum information represented 
by the sequence of qubit states can be compressed beyond the classical limit of the 
classical information associated with the entropy of preparation of p (i.e., the Shannon 
entropy of the random variable whose vàlues are the labels of the qubit states). 

Since the individual states in a mixture are not in general distinguishable, there 
are two distinct sorts of compression applicable to quantum information that do not 
apply to classical information. In blind compression, the sequence of quantum states 
produced by a source is compressed via a compression scheme that depends only on the 
identities of the quantum states and their probabilities, i.e., the input to the compression 
scheme is the density operator associated with the distribution. In visible compression, 
the identity of each individual quantum state produced by the source is assumed to 
be known, i.e., the input to the compression scheme is an individual quantum state in 
the sequence produced by the source, and the compression of the state is based on the 
probability distribution of such states. 
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An example: an inefficient visible compression scheme of the above qubit source 
(| V'l), 1^2) with probabilities pi , p 2 ) would simply involve sending the classical infor- 
mation of the quantum state labels, compressed to H(pi,p 2 ) bits per signal, to the 
receiver, where the original qubit states could then be prepared after decompression 
of the classical information. This scheme is not optimal by Schumacher's theorem 
(for nonorthogonal qubit states) because S(p) < H(pi,p 2 ). Of course, Schumacher's 
theorem refers to a compression rate of S(p) qubits per quantum signal, while the ap- 
plication of Shannon's theorem here refers to H(p 2 ,p 2 ) bits per classical signal. But 
note that the communication of one classical bit requires the same physical resource 
as the communication of one qubit, prepared in one of two orthogonal basis states. 
Note also that sending the (nonorthogonal) qubit states themselves, which would re- 
quire one qubit per signal, would not convey the identity of the states in the sequence 
to the transmitter. So the classical information about the individual state labels in the 
sequence (which would be bounded by log n per signal if we considered a source pro- 
ducing n qubit states) is really redundant if the aim is to communicate the quantum 
information associated with the sequence of qubit states. 

Remarkably, Schumacher's theorem shows that the optimal compressibility of the 
quantum information associated with a sequence of quantum pure states is S(p) qubits 
per signal, for blind or visible compression. 

To see the general idea, consider a source of (possibly nonorthogonal) qubits | tpi) , | ^2} 
with probabilities pi,p 2 . The density operator of the probability distribution is p = 

PilV'iXV'il +P2\ip2){tp2\- 

An n-sequence of states produced by the source is represented by a state 

l*i 1 ...i B ) = hfci>···IVO ( 125 ) 

in Hf n - Each such state has a probability Pi 1 ...i n — Pi x ■ ■ -Pi n - The n-sequences span 
the 2"-dimensional Hilbert space Tíf n , but as n — > 00 it turns out that the probability 
of finding an n-sequence in a 'typical subspace' (in a measurement, on an n-sequence 
produced by the source, of the projection operator onto the subspace ) tends to 1. That 
is, for any e, 5 > 0, there is a subspace of dimension between 2 n(s(p ^ 5 ' ) and 
2«(S(p)+í)^ with p ro jection operator Pj- n \ such that: 

E P íl ...vTr(|^ 1 ... í „)(* ll ... l jPÍ" ) ) = Tr(^"PÍ n) ) > 1 - e. (126) 
all sequences 

Here " = p®p . . . p, the n-fold tensor product of p with itself, is the density operator 
of n-sequences of states produced by the source: 

P® n = E K 1 ...d*u...ü(*u...d (127) 

all n-sequences 

all n-sequences 

where each state l^i,-) is one of k possible states in a d-dimensional Hilbert space. 
Recali that the statistical properties of such n-sequences of states, for all possible mea- 
surements, is given by p® n and does not depend on the representation of p® n as a 
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particular mixture of states. Since S(p) < 1 for a qubit source, the dimension of Tg 
decreases exponentially in Hf n as n — > oo, i.e., the typical subspace is exponentially 
small in Hf n for large n. 

Note that this does not mean that almost all n-sequences of states produced by the 
source lie in the typical subspace. Rather, almost all n-sequences produced by the 

(n) 

source are such that a measurement of P$ on the sequence will yield the value 1, i.e., 
almost all n-sequences produced by the source will answer 'yes' in a measurement of 
the projection operator onto the typical subspace. So, in this sense, most sequences 
produced by the source will be found to lie in the typical subspace on measurement, 
and for any subspace V of dimension less than 2 n ( s ( p ' s ^ it can be shown that the 
average probability of finding an n-sequence produced by the source in V is less than 
any pre-assigned e for sufficiently large n. 

Consider now the general case where the source produces k states |^>i) , |^) S 

TLd (not necessarily orthogonal) with probabilities p\ ,pk. Here the density oper- 
ator associated with the source is p — Pifyi) Sequences of length n span 
a subspace of d n = 2™ log d dimensions and the typical subspace Tg has dimension 
between 2 n< · s ^ p ^^ and 2 nlyS ^ +s \ which is again exponentially small in 7í® n because 
S{p)<\ogd. 

For comparison with Shannon's theorem, we write p in the spectral representation 

as: 

p = ^2p(x)\x)(x\ (129) 

X 

where {p{x)} is the set of non-zero eigenvalues of p and {\x)} is an orthonormal 
set of eigenstates of p. If p has eigenvalues p(x) and eigenstates \x), then p® n has 
eigenvalues p(xi)p(x2) ■ ■ -p(x n ) and eigenstates |xi)|x2) . . . \x n ). 

A (5-typical state is defined as a state |a:i)|:E2) ■ ■ ■ \x n ) for which the sequence 
X\, X2, ■ ■ ■ , x n is a (5-typical sequence, in the sense that (cf. Eq. dl3t ): 

2 -„(S(p)+S) < p ^ Xi x ^ < 2 -„(S( P )-<5) > (13Q) 

The í-typical subspace Tg is the subspace spannedby all the (5-typical states. Denote 
the projection operator onto by: 

P 5 (n) = \xi)(xi\®\x 2 )(x 2 \...\x n )(x n \ (131) 

í-typical states 

Then, for a fixed 6 > 0, it can be shown that for any e > and sufficiently large n 

TriP^p®* 1 ) > 1 - e; (132) 

and the dimension of (= Tr(Pg)) satisfies 

(1 - e )2"( s (")- ') < dimTf ) < 2 n ^+ 5 'K (133) 

That is, the dimension of is roughly 2 nS ( p \ which is exponentially smaller than 
the dimension of Tt® n , as n — * oo. 
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It follows that the density operator p® n can be replaced with a density operator p® n 
with support on the typical subspace (take p® n in the spectral representation, where the 
matrixis diagonal with 2 nl ° sd eigenvaluesp(xi . . . x n ) = p(x\) . . .p(x n ), andreplace 
all p(xi . . . x n ) that do not correspond to typical sequences with zeros). 

Before considering a compression/decompression scheme for quantum informa- 
tion, we need a measure of the reliability of such a scheme in terms of the fidelity, as 
in the case of classical information. The following definition generalizes the classical 
notion of fidelity in |5]( see IJozsa, 1998 p. 70]): If \ip) is any pure quantum state and 
p any mixed state, the fidelity between p and \ip) is: 

F(p, |V» = Tr(0^)(V|)) = (V|p|V) (134) 

which is the probability that a measurement of the projection operator \tp)(ip\ in the 
state p yields the outcome 1, i.e., it is the probability that p passes a test of being 
found to be \tf>) on measurement. Note that for a pure state p = F(\</>), \ifj)) = 

| (tp | </>) | 2 . The fidelity between two mixed states p and a is defined as: 9 

F(p, a) = max| | 2 - (Tr( V P 1/2 <Jp 1/2 )) 2 ■ (135) 

for all purifications \ip) of p and |0) of a. Note that in spite of appearances, F(p, a) is 
symmetric in p and a. 

In the case of a source of n-sequences of quantum states |í r i 1 ...i„) = IV'íi) ■ ■ ■ iV'in) 
with prior probabilities Pi x .,.i n — Pi 1 ■ ■ - Pi„, a compression/decompression scheme 
will in general yield a mixed state pi 1 ...i n - The average fidelity of a compression- 
decompression scheme for an rt-sequence of quantum states is defined as: 

F n = Ph..A n Ti(p il .. An \% 1 ... in )(^i 1 ... in \) (136) 

all n-sequences 

Schumacher's quantum source coding theorem (or quantum noiseless channel cod- 
ing theorem) for a quantum source that produces quantum states . . .\ip n ) S Hd 
with probabilities p\ . . . p n (so the density operator corresponding to the output of the 
source is p = ^Pi^iji^i]), states that 

for any e, S > 0: (i) there exists a compression/decompression scheme 
using S(p) + S qubits per state for Ti-length sequences produced by the 
source that can be decompressed by the receiver with fidelity F n > 1 — e, 
for sufficiently large n, and (ii) any compression/decompression scheme 
using S(p) — S qubits per state for 7i-length sequences will have a fidelity 
F n < e, for sufficiently large n. 

A compression/decompression scheme for such a quantum source would go as 
follows: The transmitter applies a unitary transformation U in Tl® n (dimension = 

9 Note that Nielsen and Chuang 1 2000 p. 409] define the fidelity F(p, <r) as the square root of the quantity 
defined here. If p and a commute, they can be diagonalized in the same basis. The definition then reduces 
to their definition of the classical fidelity between two probablity distributions defined by the eigenvalues of 
p and <j in footnote^]in '12.11 
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d 11 = 2 n g ) which maps any state in the typical subspace onto a linear superposi- 
tion of sequences of nlogd qubits, where all but the first nS(p) qubits are in the state 
|0), and then transmits the first nS(p) qubits to the receiver. So the transmitter com- 
presses nlogd qubits to nS(p) qubits. The receiver adds nlogd — nS(p) qubits in 
the state |0) and applies the unitary transformation U~ l . Since the initial nS(p) qubits 
will in general be slightly entangled with the remaining nlogd — nS{p) qubits, dis- 
carding these qubits amounts to tracing over the associated dimensions, so replacing 
these qubits with the state |0) will produce a mixed state p~ n . The state U~ 1 p~ n will pass 
a test of being found to be the original state \^íi...í„) with fidelity greater than 1 — e. 

4 Entanglement Assisted Quantum Communication 

In this section I show how entanglement can be exploited as a channel for the reliable 
transmission of quantum information. I discuss two related forms of entanglement 
assisted communication: quantum teleportation in £ 14. II and quantum dense coding in 

sa 

4.1 Quantum Teleportation 

As mentioned in §1, Schròdinger introduced the term 'entanglement' to describe the 
peculiar nonlocal correlations of the EPR-state in an extended two-part commentary 
119351 IT936I on the Einstein-Podolsky-Rosen argument |Einstei n et al, 19 35 1. Schrò- 
dinger regarded entangled states as problemàtic because they allow the possibility of 
what he called 'remote steering,' which he regarded as a mathematical artefact of the 
Hilbert space theory and discounted as a physical possibility. As it turns out, quan- 
tum teleportation is an experimentally confirmed application of remote steering be- 
tween two separated systems. This was first pointed out in a paper by Bennett, Bras- 
sard, Crépeau, Jozsa, Peres, and Wootters [ 1993 1 and later experimentally confirmed 
by several groups using a variety of different techniques [Bouwmees ter et al, 1997| 
IBoschief al, 1998||Furasawa et al, 199"8l|Nielsen et al, 1998| . 

In the 1935 paper, Schròdinger considered pure entangled states with a unique 
biorthogonal decomposition, as well as cases like the EPR-state, where a biorthogonal 
decomposition is non-unique. He showed that suitable measurements on one system 
can fix the (pure) state of the entangled distant system, and that this state depends on 
what observable one chooses to measure, not merely on the outcome of that measure- 
ment. In the second paper, he showed that a 'sophisticated experimenter,' by perform- 
ing a suitable local measurement on one system, can 'steer' the distant system into 
any mixture of pure states represented by its reduced density operator. So the distant 
system can be steered (probabilistically, depending on the outcome of the local mea- 
surement) into any pure state in the support of the reduced density operator, with a 
nonzero probability that depends only on the pure state. For a mixture of linearly in- 
dependent states of the distant system, the steering can be done by performing a local 
Standard projection-valued measurement in a suitable basis. If the states are linearly 
dependent, the experimenter performs a generalized measurement (associated with a 
POVM), which amounts to enlarging the experimenter's Hilbert space by adding an 



35 



ancilla, so that the dimension of the enlarged Hilbert space is equal to the number of 
linearly independent states. As indicated in £13 .1.11 Schròdinger's analysis anticipated 
the later result by Hughston, Jozsa, and Wootters 1 119931 . 

Suppose Alice and Bob, the traditional protagonists in any two-party communica- 
tion protocol, each holds one of a pair of qubits in the entangled state: 

\V) = -j=(\0) A \l) B -\l) A \0) B ) (137) 

Bob's qubit separately is in the mixed state p B = 1/2, which can be interpreted as an 
equal weight mixture of the orthogonal states \0)b, |1) b, or, equivalently, as an infinity 
of other mixtures including, to take a specific example, the equal weight mixture of the 
four nonorthogonal normalized states: 

\<h) B = a\0) B +P\l) B 
\4>2)b = a\0) B -p\l) B 
\4> 3 )b = P\0)B + a\l) B 
\4>í)b = P\0) B ~a\l) B 

That is: 

PB = 1/2 = ~(|0i)(</>i| + |0 2 )(0 2 | + \<j> 3 )(fo\ + \<k)(<l>i\) (138) 

If Alice measures the observable with eigenstates \0)a, 1 1) a on her qubit A, and 
Bob measures the corresponding observable on his qubit B, Alice's outcomes will be 
oppositely correlated with Bob's outcomes (0 with 1, and 1 with 0). If, instead, Alice 
prepares an ancilla qubit A' in the state \4>i)a' — a\0) A > + /3| 1) A' and measures an 
observable on the pair of qubits A' + A in her possession with eigenstates: 



|1> = (|0)a'|1)a-|1)a'|0)a)/V2 (139) 

|2) = Q0) A '\1)a + \1)a'\0)a)/V2 (140) 

|3) = {\0)a'\0)a-\1)a'\1)a)/V2 (141) 

|4) = (\0) A ,\0) A + \l) A ,\l) A )/y/2 (142) 



(the Bell states defining the Bell basis in TL A (g) H A ), she will obtain the outcomes 1, 
2, 3, 4 with equal probability of 1/4, and these outcomes will be correlated with Bob's 
states \4>i)b, \4>2) B , \4>3)b, \4>4) b- That is, if Bob checks to see whether his particle is 
in the state \4>i) B when Alice reports that she obtained the outcome i, he will find that 
this is always in fact the case. This follows because 

|&>A'|*> = \{-\V)\<h)B - |2)|02>s + |3)|03) fl - |4)|0 4 )b) (143) 

In this sense, Alice can steer Bob's particle into any mixture compatible with his den- 
sity operator p B = 1/2 by an appropriate local measurement. 

What Schròdinger found problemàtic about entanglement was the possibility of 
remote steering in the above sense 1 119351 p. 556]: 
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It is rather discomforting that the theory should allow a system to be 
steered or piloted into one or the other type of state at the experimenter's 
mercy in spite of his having no access to it. 

Now, remote steering in this probabilistic sense is precisely what makes quantum tele- 
portation possible. Suppose Alice and Bob share a pair of qubits in the entangled state 
dl37t and Alice is given a qubit A! in an unknown state \<j>i) that she would like to 
send to Bob. There is no procedure by which Alice can determine the identity of the 
unknown state, but even if she could, the amount of classical information that Alice 
would have to send to Bob in order for him to prepare the state \(f>i) is potentially in- 
finite, since the precise specification of a general normalized qubit state a|0) + /3|1) 
requires two real parameters ( the number of independent parameters is reduced from 
four to two because \a\ 2 + \(3\ 2 = 1 and the overall phase is irrelevant). Alice could 
send the qubit itself to Bob, but the quantum information in the qubit state might be 
corrupted by transmission through a possibly noisy environment. 

Instead, for the cost of just two bits of classical information, Alice can succeed in 
communicating the unknown quantum state \(f>i) to Bob with perfect reliability. What 
Alice does is to measure the 2-qubit system A! + A in her possession in the Bell basis. 
Depending on the outcome of her measure ment, í = 1, 2, 3, or 4 with equal proba- 
bility, Bob's qubit will be steered into one of the states |çí>i) b, \4>2) b, \4>3) b, \4>4)b- 
If Alice communicates the outcome of her measurement to Bob (requiring the trans- 
mission of two bits of classical information), Bob can apply one of four local unitary 
transformations in his Hilbert space to obtain the state \4>i)b'- 

i = 1: do nothing, i.e., apply the identity transformation / 

i = 2: apply the transformation a z 

i = 3: apply the transformation a x 

i — 4: apply the transformation ia y 

where a x ,a y , a z are the Pauli spin matrices. 

The trick that results in the communication of the state \<f>i) from Alice to Bob, 
without the qubit A! literally traveling from Alice to Bob, is the ability afforded Al- 
ice by the shared entangled state to correlate one of four measurement outcomes (each 
occurring with probability 1/4) with one of four states that together represent a par- 
ticular decomposition of Bob's mixed state. The communication of the state of A! is 
completed by Bob's operation, which requires that Alice sends the two bits of classi- 
cal information about her measurement outcome to Bob. In the teleportation protocol, 
the state of the particle A! is destroyed by Alice's measurement and re-created as the 
state of Bob's particle by Bob's operation — in fact, the systems A and A! end up in 
an entangled state as the result of Alice's measurement. Note that if the state |</>i) of 
A! were not destroyed there would be two copies of the state, which would violate the 
quantum 'no cloning' theorem. So neither Alice nor Bob, nor any other party, can gain 
any information about the identity of the teleported state, because the recording of such 
information in the state of another quantum system would amount to a partial copying 
of the information in the teleported state. 
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Shared entanglement provides a secure and reliable channel for quantum commu- 
nication. This might be useful for the communication of quantum information between 
parties in a cryptographic protocol, or for the transmission of quantum information be- 
tween the processing components of a quantum computer. It is a feature of an entangled 
state shared by two parties that the entanglement is not affected by noise in the environ- 
ment between them. So the reliability of the communication of quantum information by 
teleportation depends on the reliability of the required classical communication, which 
can be protected against noise by well-known techniques of error-correcting codes. An 
entangled state shared by two parties is also unaffected by changes in their relative 
spatial location. So Alice could teleport a quantum state to Bob without even knowing 
Bob's location, by broadcasting the two bits of information. 



4.2 Quantum Dense Coding 

We know from the Holevo bound (see £13.41 that the maximum amount of classical 
information that can be reliably communicated by encoding the information in the 
quantum state of a qubit is one bit, even though an arbitrarily large amount of classical 
information can be encoded in the state of a qubit (by encoding symbols as nonorthogo- 
nal quantum states). Quantum dense coding is a procedure, first pointed out by Bennett 
and Wiesner [ 1992 1, for exploiting entanglement to double the amount of classical in- 
formation that can be communicated by a qubit. 
Consider again the Bell states: 

|1) = (|0)|1)-|1)|0))/V2 (144) 

|2} = (|0)|1> + |1)|0»/V2 (145) 

|3) = (|0)|0)-|1)|1))/n/2 (146) 

|4) = (|0)|0) + |1)|1))/V2 (147) 

Suppose Alice and Bob share a pair of qubits in the state 

|1) = (|0)a|1)b - |1)a|0)b)/V2 (148) 

By performing one of four local operations on the qubit in her possession defined by 
the unitary transformations in 7í A : 

Ui = I (149) 
U 2 = o x (150) 
U 3 = a x (151) 

U 4 = Í(7y (152) 

Alice can transform the state |1) of the qubit pair into any Bell state. For example: 

= |1> (153) 

<t z \1) = |2) (154) 

a x \l) = |3) (155) 

i<r y \l) = |4) (156) 
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So to communicate two classical bits to Bob, Alice applies one of the four opera- 
tions above to her qubit and sends the qubit to Bob. Bob then performs a measurement 
on the two qubits in the Bell basis. Since these are orthogonal states, he can distinguish 
the states and identify Alice's operation. 

5 Quantum Cryptography 

Over the past few years, quantum cryptography as emerged as perhaps the most suc- 
cessful area of application of quantum information theoretic ideas. The main results 
have been a variety of provably secure protocols for key distribution, following an 
original proposal by Bennett and Brassard |1984|, and an important 'no go' theorem 
by Mayers [1996b 1997 1 and Lo and Chau 1 1998 1: the impossibility of uncondition- 
ally secure two-party quantum bit commitment. The quantum bit commitment theorem 
generalizes previous results restricted to one-way communication protocols by Mayers 
1 1996a| and by Lo and Chau [ 1997 1 and applies to quantum, classical, and quantum- 
classical hybrid schemes (since classical information, as we have seen, can be regarded 
as quantum information subject to certain constraints). The restriction to two-party 
schemes excludes schemes that involve a trusted third-party or trusted channel proper- 
ties, and the restriction to schemes based solely on the principies of quantum mechanics 
excludes schemes that exploit special relativistic signalling constraints, or schemes that 
might involve time machines or the thermodynamics of black holes, etc. 

In t|5.1l I show how the security of quantum key distribution depends on fea- 
tures of quantum information — no cloning, no information gain without disturbance, 
entanglement — that prevent an eavesdropper from secretly gaining information about 
the quantum communication between two parties, i.e., completely undetectable eaves- 
dropping is in principle impossible for quantum communication. In %5.2\ I dscuss 
quantum bit commitment and show why unconditionally secure quantum bit commit- 
ment is impossible. 

5.1 Key Distribution 

5.1.1 Quantum Key Distribution Protocols 

In a quantum key distribution protocol, the object is for two parties, Alice and Bob, 
who initially share no information, to exchange information via quantum and classical 
channels, so as to end up sharing a secret key which they can then use for encryption, in 
such a way as to ensure that any attempt by an eavesdropper, Eve, to gain information 
about the secret key will be detected with non-zero probability. 

The one-time pad provides a perfectly secure way for Alice and Bob to commu- 
nicate classical information, but this is also the only way that two parties can achieve 
perfectly security classical communication. The one-time pad is, essentially, a random 
sequence of bits. If Alice and Bob both have a copy of the one-time pad, Alice can 
communicate a message to Bob securely by converting the message to an rt-bit binary 
number (according to some scheme known to both Alice and Bob), and adding (bit- 
wise, modulo 2) the sequence of bits in the binary number to an n-length sequence 
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of bits from the top of the one-time pad. Alice sends the encrypted sequence to Bob, 
which Bob can then decrypt using the same sequence of bits from his copy of the one- 
time pad. Since the encrypted message is random, it is impossible for Eve to decrypt 
the message without a copy of the one-time pad. It is essential to the security of the 
scheme that the n random bits used to encrypt the message are discarded once the mes- 
sage is transmitted and decrypted, and that a unique random sequence is used for each 
distinct message — hence the term 'one-time pad.' 

This procedure guarantees perfect privacy, so long as Alice and Bob, and only Alice 
and Bob, can each be assumed to possess a copy of an arbitrarily long one-time pad. 
But this means that in order for two parties to communicate secretly, they must already 
share a secret: the random key. The key distribution problem is the problem of how to 
distribute the key securely in the first place without the key being secretly intercepted 
during transmission and copied, and the key storage problem is the problem of how to 
store the key securely without it being secretly copied. We would like a procedure that 
can be guaranteed to be secure against passive eavesdropping, so that Alice and Bob 
can be confident that their Communications are in fact private. 

The key idea in quantum cryptography is to exploit the indistinguishability of 
nonorthogonal quantum states, which we saw in £13 . 31 entails that any information 
gained by Eve about the identity of such states will introduce some disturbance of 
the states that can be detected by Alice and Bob, and the 'no cloning' theorem, which 
makes it impossible for Eve to copy quantum Communications between Alice and Bob 
and store them for later analysis (perhaps using, in addition, intercepted classical Com- 
munications between Alice and Bob). 

A large variety of quantum key distribution schemes have been proposed following 
the original Bennett and Brassard protocol 1 1984 1, now known as BB84. The core idea 
there was for Alice to send Bob a sequence of qubits, prepared with equal probability 
in one of the states |0), |1), |+), |— ), where the pair of orthogonal states |0), |1) are 
nonorthogonal to the pair of orthogonal states | +) , | — ) . Bob measures each qubit ran- 
domly in either the basis |0), |1) or the basis |+), |— ). Following his measurements, 
he publicly broadcasts the basis he used for each qubit in the sequence, and Alice pub- 
licly broadcasts which of these bases is the same as the basis she used to prepare the 
qubit. Alice and Bob then discard the qubits for which their bases disagree. Since the 
outcome states of Bob's measurements are the same as the states Alice prepared, Alice 
and Bob share a random key on the remaining qubits. They can then sacrifice a portion 
of these qubits to detect eavesdropping. Alice publicly announces the qubit state she 
prepared and Bob checks his measurement outcome to confirm this. If they agree on a 
sufficient number of qubit states (depending on the expected error rate), they conclude 
that there has been no eavesdropping and use the remaining portion as the secret key. If 
they don't agree, they conclude that the qubits have been disturbed by eavesdropping, 
in which case they discard all the qubits and begin the procedure again. The actual 
protocol involves further subtleties in which a perfectly secure secret key is distilled 
from the 'raw key' obtained in this way by techniques of error correction and privacy 
amplification. 

The BB84 scheme solves the key distribution problem, in the sense that Alice and 
Bob, who initially share no secrets, can end up sharing a secret key via a key dis- 
tribution protocol that excludes the possibility of eavesdropping, with arbitrarily high 
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reliability(since the length of the sequence of qubits sacrificed to detect eavesdrop- 
ping can be arbitrarily long). Clearly, it does not solve the key storage problem, since 
the output of the key distribution protocol is stored as classical information, which is 
subject to passive eavesdropping. 

A scheme proposed by Ekert 1 1991 1 allows Alice and Bob to create a shared ran- 
dom key by performing measurements on two entangled qubits. Suppose Alice and 
Bob share many copies of an entangled pure state of two qubits, say the Bell state 
77j(|0)|l) — |1) |0)) (perhaps emitted by a common source of entangled pairs between 
Alice and Bob). Alice and Bob agree on three observables that they each measure on 
their qubits, where the measurements are chosen randomly and independently for each 
qubit. After a sequence of measurements on an appropriate number of pairs, Alice and 
Bob announce the directions of their measurements publicly and divide the measure- 
ments into two groups: those in which they measured the spin in different directions, 
and those in which they measured the spin in the same direction. They publicly re- 
veal the outcomes of the first group of measurements and use these to check that the 
singlet states have not been disturbed by eavesdropping. Essentially, they calculate a 
correlation coefficient: any attempt by an eavesdropper, Eve, to monitor the partides 
will disturb the entangled state and result in a correlation coefficient that is bounded 
by Bell's inequality and is therefore distinguishable from the correlation coefficient for 
the entangled state. If Alice and Bob are satisfied that no eavesdropping has occurred, 
they use the second group of oppositely correlated measurement outcomes as the key. 



5.1.2 Quantum Key Distribution via Pre- and Post-Selection 

The Ekert scheme solves the key distribution problem as well as the key storage prob- 
lem, because a new key is generated for each message from the stored entangled states, 
and there is no information about the key in the entangled states. Here I describe a key 
distribution protocol that also involves entangled states (see |Bub, 2001b |), but with a 
different type of test for eavesdropping. Instead of a statistical test based on Bell's the- 
orem, the test exploits conditional statements about measurement outcomes generated 
by pre- and post-selected quantum states. 

The peculiar features of pre- and post-selected quantum states were first pointed 
out by Aharonov, Bergmann, and Lebowitz [ 1964)- If: 

(i) Alice prepares a system in a certain state |pre) at time íi, 

(ii) Bob measures some observable M on the system at time ti, 

(iii) Alice measures an observable of which |post) is an eigenstate at time Í3, and 
post-selects for |post), 

then Alice can assign probabilities to the outcomes of Bob's M -measurement at t%, 
conditional on the states |pre) and |post) at times t\ and Í3, respectively, as follows 
| Aha ronov et al, 1964| |Vaidman et al , 1987| : 

|(pre|P fc |post)| 2 
prob(ç fc ) = r— -— (157) 

EilÍPrel^lPost)! 2 
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where P,; is the projection operatoronto the z'th eigenspace of M. Notice that M511 — 
referredto as the 'ABL-rule' (Aharonov-Bergmann-Lebowitzrule) in the following — is 
time-symmetric, in the sense that the states |pre) and |post) can be interchanged. 

If M is unknown to Alice, she can use the ABL-rule to assign probabilities to 
the outcomes of various hypothetical M-measurements. The interesting peculiarity of 
the ABL-rule, by contrast with the usual Born rule for pre-selected states, is that it is 
possible — for an appropriate choice of observables M, M' , . . . , and states |pre) and 
|post) — to assign unit probability to the outcomes of a set of mutually noncommuting 
observables. That is, Alice can be in a position to assert a conjunction of conditional 
statements of the form: Tf Bob measured M, then the outcome must have been rrii, 
with certainty, and if Bob measured AI', then the outcome must have been m'j, with 
certainty, . . . ,' where AI, AI', . . . are mutually noncommuting observables. Since Bob 
could only have measured at most one of these noncommuting observables, Alice's 
conditional information does not, of course, contradict quantum mechanics: she only 
knows the eigenvalue to; of an observable M if she knows that Bob in fact measured 
M. ' 

Vaidman, Aharonov, and Albert 1 1987 1 discuss a case of this sort, where the out- 
come of a measurement of any of the three spin observables X — a x , Y = <j y , Z = a z 
of a spin- i particle can be inferred from an appropriate pre- and post-selection. Alice 
prepares the Bell state 

|pre> = -±=(|T*>A|T*>c + | Wa| Wc (158) 

where | | 2 ) and | | z ) denote the <r z -eigenstates. Alice sends one of the partides — the 
channel particle, denoted by the subscript C — to Bob and keeps the ancilla, denoted 
by A. Bob measures either X, Y, or Z on the channel particle and returns the channel 
particle to Alice. Alice then measures an observable R on the pair of partides, where 
R has the eigenstates (the subscripts A and C are suppressed): 
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Note that: 



|pre) = -^(|t»>|t»> + IUIi»> d63) 
= -^(IUIU + IUIix) (164) 
= ^(lT»>|iv> + U»>lTv> d65) 
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= + b) + |r 3 ) + \r 4 )) 



(166) 



Alice can now assign vàlues to the outcomes of Bob's spin measurements via the 
ABL-rule, whether Bob measured X, Y, or Z, based on the post-selections \ri), ^a), 
\rs), or |r4), according to Table^íwhere represents the outcome f and 1 represents 
the outcome j.) | Vaidma n et al. , 1 9871 : 
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Table 1: a x , a y , a z measurement outcomes correlated with eigenvalues of R 

This case can be exploited to enable Alice and Bob to share a private random key in 
the following way: Alice prepares a certain number of copies (depending on the length 
of the key and the level of privacy desired) of the Bell state |pre) in Eq. (11581 . She 
sends the channel partides to Bob in sequence and keeps the ancillas. Bob measures X 
or Z randomly on the channel partides and returns the partides, in sequence, to Alice. 
Alice then measures the observable R on the ancilla and channel pairs and divides 
the sequence into two subsequences: the subsequence S14 for which she obtained the 
outcomes r\ or r 4 , and the subsequence S2Z for which she obtained the outcomes 
T2 or r3. The sequence of operations can be implemented on a quantum circuit; see 
|Metz gèr72QOO| . 

To check that the channel partides have not been monitored by Eve, Alice now 
publicly announces (broadcasts) the indices of the subsequence ^23 . As is evident from 
Table n f° r this subsequence she can make conditional statements of the form: 'For 
channel particle i, if X was measured, the outcome was 1 (0), and if Z was measured, 
the outcome was 0(1),' depending on whether the outcome of her iï-measure ment was 
T2 or r^. She publicly announces these statements as well. If one of these statements, 
for some index i, does not agree with Bob's records, Eve must have monitored the 
z'th channel particle. (Of course, agreement does not entail that the particle was not 
monitored.) 

For suppose Eve measures a different spin component observable than Bob on a 
channel particle and Alice subsequently obtains one of the eigenvalues r2 or r$ when 
she measures R. Bob's measurement outcome, either or 1, will be compatible with 
just one of these eigenvalues, assuming no intervention by Eve. But after Eve's mea- 
surement, both of these eigenvalues will be possible outcomes of Alice's measurement. 
So Alice's retrodictions of Bob's measurement outcomes for the subsequence S23 will 
not necessarily correspond to Bob's records. In fact, one can show that if Eve measures 
X 01 Z randomly on the channel partides, or if she measures a particular one of the 
observables X, Y, or Z on the channel partides (the same observable on each particle), 
the probability of detection in the subsequence S23 is 3/8. 

In the subsequence S14, the and 1 outcomes of Bob's measurements correspond 
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to the outcomes r\ and of Alice's iï-measurements. If, following their públic com- 
munication about the subsequence S23, Alice and Bob agree that there has been no 
monitoring of the channel partides by Eve, they use the subsequence S14 to define a 
shared raw key. 

Note that even a single disagreementbetween Alice's retrodictions and Bob's records 
is sufficient to reveal that the channel partides have been monitored by Eve. This dif- 
fers from the eavesdropping test in the Ekert protocol. Note also that Eve only has 
access to the channel partides, not the particle pairs. So no strategy is possible in 
which Eve replaces all the channel partides with her own partides and entangles the 
original channel partides, treated as a single system, with an ancilla by some unitary 
transformation, and then delays any measurements until after Alice and Bob have com- 
municated publicly. There is no way that Eve can ensure agreement between Alice and 
Bob without having access to the particle pairs, or without information about Bob's 
measurements. 

The key distribution protocol as outlined above solves the key distribution problem 
but not the key storage problem. If Bob actually makes the random choices, measures 
X or Z, and records definite outcomes for the spin measurements before Alice mea- 
sures R, as required by the protocol, Bob's measurement records — stored as classical 
information — could in principle be copied by Eve without detection. In that case, Eve 
would know the raw key (which is contained in this information), following the públic 
communication between Alice and Bob to verify the integrity of the quantum commu- 
nication channel. 

To solve the key storage problem, the protocol is modified in the following way: 
Instead of actually making the random choice for each channel particle, measuring one 
of the spin observables, and recording the outcome of the measurement, Bob keeps 
the random choices and the spin measurements 'at the quantum level' until after Alice 
announces the indices of the subsequence S23 of her R measurements. To do this, Bob 
enlarges the Hilbert space by entangling the quantum state of the channel particle via 
a unitary transformation with the states of two ancilla partides that he introduces. One 
particle is associated with a Hilbert space spanned by two eigenstates, \dx) and \dz), 
of a choice observable or 'quantum die' observable D. The other particle is associated 
with a Hilbert space spanned by two eigenstates, and \pi), of a pointer observable 
P. (See £15.2.2l for a discussion of how to implement the unitary transformation on the 
enlarged Hilbert space.) 

On the modified protocol (assuming the ability to store entangled states indefi- 
nitely), Alice and Bob share a large number of copies of an entangled 4-particle state. 
When they wish to establish a random key of a certain length, Alice measures R on an 
appropriate number of particle pairs in her possession and announces the indices of the 
subsequence S23. Before Alice announces the indices of the subsequence S23, neither 
Alice nor Bob have stored any classical information. So there is nothing for Eve to 
copy. After Alice announces the indices of the subsequence S23, Bob measures the ob- 
servables D and P on his ancillas with these indices and announces the eigenvalue \p^} 
or \pi) as the outcome of his X or Z measurement, depending on the eigenvalue of D. 
If Alice and Bob decide that there has been no eavesdropping by Eve, Bob measures 
C and P on his ancillas in the subsequence S14. It is easy to see that the ABL-rule 
applies in this case, just as it applies in the case where Bob actually makes the random 
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choice and actually records definite outcomes of his X or Z measurements before Al- 
ice measures R. In fact, if the two cases were not equivalent for Alice — if Alice could 
teli from her iï-measurements whether Bob had actually made the random choice and 
actually performed the spin measurements, or had merely implemented these actions 
'at the quantum level' — the difference could be exploited to signal superluminally. 

5.2 Bit Commitment 
5.2.1 Some History 

In a bit commitment protocol, one party, Alice, supplies an encrypted bit to a second 
party, Bob. The information available in the encrypted bit should be insufficient for 
Bob to ascertain the value of the bit, but sufficient, together with further information 
supplied by Alice at a subsequent stage when she is supposed to reveal the value of 
the bit, for Bob to be convinced that the protocol does not allow Alice to cheat by 
encrypting the bit in a way that leaves her free to reveal either or 1 at will. 

To illustrate the idea, suppose Alice claims the ability to predict the outcomes of 
elections. To substantiate her claim without revealing valuable information (perhaps to 
a potential employer, Bob) she suggests the following demonstration: She proposes to 
record her prediction about whether a certain candidate will win or lose by writing a 
(for iose') or a 1 (for 'win') on a note a month before the election. She will then lock 
the note in a safe and hand the safe to Bob, but keep the key. After the election, she 
will announce the bit she chose and prové that she in fact made the commitment at the 
earlier time by handing Bob the key. Bob can then open the safe and read the note. 

Obviously, the security of this procedure depends on the strength of the safe walls or 
the ingenuity of the locksmith. More generally, Alice can send (encrypted) information 
to Bob that guarantees the truth of an exclusive classical disjunction (equivalent to her 
commitment to a or a 1 ) only if the information is biased towards one of the alternative 
disjuncts (because a classical exclusive disjunction is true if and only if one of the 
disjuncts is true and the other false). No principle of classical mechanics precludes 
Bob from extracting this information, so the security of a classical bit commitment 
protocol can only be a matter of computational complexity. 

The question is whether there exists a quantum analogue of this procedure that 
is unconditionally secure: provably secure as a matter of physical law (according to 
quantum theory) against cheating by either Alice or Bob. Note that Bob can cheat if 
he can obtain some information about Alice's commitment before she reveals it (which 
would give him an advantage in repeti tions of the protocol with Alice). Alice can cheat 
if she can delay actually making a commitment until the final stage when she is required 
to reveal her commitment, or if she can change her commitment at the final stage with 
a very low probability of detection. 

Bennett and Brassard originally proposed a quantum bit commitment protocol in 
1 1984|. The bàsic idea was to associate the and 1 commitments with two different 
mixtures represented by the same density operator. As they showed in the same paper, 
Alice can cheat by adopting an 'EPR attack' or cheating strategy: she prepares entan- 
gled pairs of qubits, keeps one of each pair (the ancilla) and sends the second qubit 
(the channel particle) to Bob. In this way she can fake sending one of two equivalent 
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mixtures to Bob and reveal either bit at will at the opening stage by effectively steering 
Bob's particle into the desired mixture by an appropriate measurement. Bob cannot 
detect this cheating strategy. 

In a later paper, Brassard, Crépeau, Josza, and Langlois 1 1993 1 proposed a quan- 
tum bit commitment protocol that they claimed to be unconditionally secure. The 
BCJL scheme was first shown to be insecure by Mayers [ 1996a|. Subsequently, May- 
ers I1996bl [l997 1 and Lo and Chau II 199711 19981 independently showed that the insight 
of Bennett and Brassard in 1 1 984 1 can be extended to a proof that a generalized version 
of the EPR cheating strategy can always be applied, if the Hilbert space is enlarged in 
a suitable way by introducing additional ancilla partides. 

The impossibility of unconditionally secure quantum bit commitment came as some- 
thing of a surprise to the community of quantum cryptologists and has profound con- 
sequences. Indeed, it would not be an exaggeration to say that the significance of 
the quantum bit commitment theorem for our understanding of quantum mechanics 
is comparable to Bell's theorem | Bell, 1964 1. Brassard and Fuchs have speculated 
( IBrassard, 2000| |Fuchs, 19971 |Fuchs, 20001 |Fuchs and Jacobs, 2002| ) that quantum 
mechanics can be derived from two postulates about quantum information: the pos- 
sibility of secure key distribution and the impossibility of secure bit commitment. We 
shall see in f7]what this means for the foundations of quantum mechanics. 

Perhaps because of the simplicity of the proof and the universality of the claim, 
the quantum bit commitment theorem is continually challenged in the literature, on 
the basis that the proof does not cover all possible procedures that might be exploited 
to implement quantum bit commitment (see, e.g., Yuen |2005|). There seems to be 
a general feeling that the theorem is 'too good to be true' and that there must be a 
loophole. 

In fact, there is no loophole. While Kent 1 1999a 1999b | has shown how to im- 
plement a secure classical bit commitment protocol by exploiting relativistic signalling 
constraints in a timed sequence of Communications between verifiably separated sites 
forboth Alice and Bob, and Hardy and Kent |2004| and Aharonov, Ta-Shma, Vazirani, 
and Yao |2005| have investigated the security of 'cheat-sensitive' or 'weak' versions 
of quantum bit commitment, these results are not in conflict with the quantum bit com- 
mitment theorem. In a bit commitment protocol as usually understood, there is a time 
interval of arbitrary length, where no information is exchanged, between the end of 
the commitment stage of the protocol and the opening or unveiling stage, when Al- 
ice reveals the value of the bit. Kent's ingenious scheme effectively involves a third 
stage between the commitment state and the unveiling stage, in which information is 
exchanged between Bob's sites and Alice's sites at regular intervals until one of Alice's 
sites chooses to unveil the originally committed bit. At this moment of unveiling the 
protocol is not yet complete, because a further sequence of unveilings is required be- 
tween Alice's sites and corresponding sites of Bob before Bob has all the information 
required to verify the commitment at a single site. If a bit commitment protocol is un- 
derstood to require an arbitrary amount of free time between the end of the commitment 
stage and the opening stage (in which no step is to be executed in the protocol), then 
the quantum bit commitment theorem covers protocols that exploit special relativistic 
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signalling constraints. 



5.2.2 A Key Observation 

The crucial insight underlying the proof of the quantum bit commitment theorem is 
that any step in a quantum bit commitment protocol that requires Alice or Bob to make 
a definite choice (whether to perform one of a number of alternative measurements, 
or whether to implement one of a number of alternative unitary transformations) can 
always be replaced by an EPR cheating strategy in the generalized sense, assuming 
that Alice and Bob are both equipped with quantum computers. That is, a classical 
disjunction over definite possibilities — this operation or that operation — can always 
be replaced by a quantum entanglement and a subsequent measurement (perhaps at a 
more convenient time for the cheater) in which one of the possibilities becomes defi- 
nite. Essentially, the classical disjunction is replaced by a quantum disjunction. This 
cheating strategy cannot be detected. Similarly, a measurement can be 'held at the 
quantum level' without detection: instead of performing the measurement and obtain- 
ing a definite outcome as one of a number of possible outcomes, a suitable unitary 
transformation can be performed on an enlarged Hilbert space, in which the system 
is entangled with a 'pointer' ancilla in an appropriate way, and the procedure of ob- 
taining a definite outcome can be delayed. The key point is the possibility of keeping 
the series of transactions between Alice and Bob at the quantum level by enlarging the 
Hilbert space, until the final exchange of classical information when Alice reveals her 
commitment. 

Any quantum bit commitment scheme will involve a series of transactions be- 
tween Alice and Bob, where a certain number, n, of quantum systems — the 'channel 
partides' — are passed between them and subjected to various quantum operations (uni- 
tary transformations, measurements, etc), possibly chosen randomly. These operations 
can always be replaced, without detection, by entangling a channel particle with one 
or more ancilla partides that function as 'pointer' partides for measurements or 'die' 
partides forrandom choices. In effect, this is the (generalized) EPR cheating strategy. 

To illustrate: Suppose, at a certain stage of a quantum bit commitment protocol, 
that Bob is required to make a random choice between measuring one of two observ- 
ables, X or Y, on each channel particle he receives from Alice. For simplicity, assume 
that X and Y each have two eigenvalues, x\, x 2 and y\, y 2 - After recording the out- 
come of the measurement, Bob is required to return the channel particle to Alice. When 
Alice receives the i'th channel particle she sends Bob the next channel particle in the 
sequence. We may suppose that the measurement outcomes that Bob records form part 
of the information that enables him to confirm Alice's commitment, once she discloses 
it (together with further information), so he is not required to report his measurement 
outcomes to Alice until the final stage of the protocol when she reveals her commit- 
ment. 

Instead of following the protocol, Bob can construct a device that entangles the 
input state \ip)c of a channel particle with the initial states, |c?o)b and \po)b, of two 
ancilla partides that he introduces, the first of which functions as a 'quantum die' for 

10 I am indebted to Dominic Mayers for clarifying this point. 
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the random choice and the second as a 'quantum pointer' for the measurement. It 
is assumed that Bob's ability to construct such a device — in effect, a special purpose 
quantum computer — is restricted only by the laws of quantum mechanics. 

The entanglement is implemented by a unitary transformation in the following 
way: 11 Define two unitary transformations, Ux and Uy, that implement the X and 
Y measurements 'at the quantum level' on the tensor product of the Hilbert space of 
the channel particle, He, and the Hilbert space of Bob's pointer ancilla, H b p ■ 



and 



\xi)c\po)b — ^ \xi)c\pi)b 

\X2)c\P0)B \x 2 )c\P2)b 

\vi)c\po)b ^* \Ví)c\Pi)b 

|2/2>c||Po)s ^* \y2)c\P2)B 



(167) 



(168) 



so that 

W)c\P0)B ^ (xi\lp)\xi)c\pi)B + {X2\Í>)\X2)c\P2)b (169) 

and 

\í>)c\po)b — ^ (m\ip)\yi)c\pi)B + {y2\^)\y2)c\p2)B (170) 

The random choice is defined similarly by a unitary transformation V on the tensor 
product of the Hilbert space of Bob's die ancilla, Hb d , and the Hilbert space Hc ® 
Ti.Bp- Suppose \dx) and \dy) are two orthogonal states in TLb d and that \<1q) = 
■^\dx) + \dy)- Then (suppressing the obvious subscripts) V is defined by: 

\dx) ® \il>)\po) \dx) <2>UxW)\po) 

\dy)®\il>)\po) \d Y )®U Y W)\po) (171) 

so that 

|do>® W)\to) 

■^\dx) ® U x \ip)\po) + ^=\d Y ) ® Uy\^)\p ) (172) 

where the tensor product symbol has been introduced selectively to indicate that U x 
and U y are defined on Tic ®Hb p . 

If Bob were to actually choose the observable X 01Y randomly, and actually per- 
form the measurement and obtain a particular eigenvalue, Alice's density operator for 
the channel particle would be: 

\{\ {XM) I 2 |Z1><Z1|+|<Z 2 |V> | 2 \X2){X 2 \) 

+^(l W) I 2 \yi)(yi\+ I (yM I 2 |y 2 )(y 2 |) (173) 

1 1 Note that there is no loss of generality in assuming that the channel particle is in a pure state. If the chan- 
nel particle is entangled with Alice's ancillas, the device implements the entanglement via the transformation 
I (g) • • •, where / is the identity operator in the Hilbert space of Alice's ancillas. 



48 



assuming that Alice does not know what observable Bob chose to measure, nor what 
outcome he obtained. But this is precisely the same density operator generated by 
tracing over Bob's ancilla partides for the state produced in ( 11721 . In other words, 
the density operator for the channel particle is the same for Alice, whether Bob ran- 
domly chooses which observable to measure and actually performs the measurement, 
or whether he implements an EPR cheating strategy with his two ancillas that produces 
the transition 11721 on the enlarged Hilbert space. 

If Bob is required to eventually report what measurement he performed and what 
outcome he obtained, he can at that stage measure the die ancilla for the eigenstate 
\dx) or \dy), and then measure the pointer ancilla for the eigenstate \px) or \p2). In ef- 
fect, if we consider the ensemble of possible outcomes for the two measurements, Bob 
will have converted the 'improper' mixture generated by tracing over his ancillas to a 
'proper' mixture. But the difference between a proper and improper mixture is unde- 
tectable by Alice since she has no access to Bob's ancillas, and it is only by measuring 
the composite system consisting of the channel particle together with Bob's ancillas 
that Alice could ascertain that the channel particle is entangled with the ancillas. 

In fact, if it were possible to distinguish between a proper and improper mixture, it 
would be possible to signal superluminally: Alice could know instantaneously whether 
or not Bob performed a measurement on his ancillas by monitoring the channel par- 
tides in her possession. Note that it makes no difference whether Bob or Alice mea- 
sures first, since the measurements are of observables in different Hilbert spaces, which 
therefore commute. 

Clearly, a similar argument applies if Bob is required to choose between alternative 
unitary operations at some stage of a bit commitment protocol. Perhaps less obviously, 
an EPR cheating strategy is also possible if Bob is required to perform a measurement 
or choose between alternative operations on channel particle conditional on the 

outcome of a prior measurement on channel particle i, or conditional on a prior choice 
of some operation from among a set of alternative operations. Of course, if Bob is in 
possession of all the channel partides at the same time, he can perform an entanglement 
with ancillas on the entire sequence, considered as a single composite system. But even 
if Bob only has access to one channel particle at a time (which he is required to return 
to Alice after performing a measurement or other operation before she sends him the 
next channel particle), he can always entangle channel particle i + 1 with the ancillas 
he used to entangle channel particle i. 

For example, suppose Bob is presented with two channel partides in sequence. 
He is supposed to decide randomly whether to measure X or Y on the first particle, 
perform the measurement, and return the particle to Alice. After Alice receives the first 
particle, she sends Bob the second particle. If Bob measured X on the first particle and 
obtained the outcome X\, he is supposed to measure X on the second particle; if he 
obtained the outcome x 2 , he is supposed to measure Y on the second particle. If he 
measured Y on the first particle and obtained the outcome í/i, he is supposed to apply 
the unitary transformation U\ to the second particle; if he obtained the outcome y 2 , 
he is supposed to apply the unitary transformation XJ%. After performing the required 
operation, he is supposed to return the second particle to Alice. 

It would seem at first sight that Bob has to actually perform a measurement on the 
first channel particle and obtain a particular outcome before he can apply the protocol 
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to the second particle, given that he only has access to one channel particle at a time, 
so an EPR cheating strategy is excluded. But this is not so. Bob's strategy is the fol- 
lowing: He applies the EPR strategy discussed above for two alternative measurements 
to the first channel particle. For the second channel particle, he applies the following 
unitary transformation on the tensor product of the Hilbert spaces of his ancillas and 
the channel particle, where the state of the second channel particle is denoted by \<f>), 
and the state of the pointer ancilla for the second channel particle is denoted by |ço) (a 
second die particle is not required): 

|dx>|pi>|0>|«ò) ^ \dx)\Pi)®U x \4>)\q ) 

|djC>|P2>|^>|«ò) ^ |d*>|í>2>® W>|«b) 

\d Y )\ Pl )\4>)\q ü ) ^ |<M|pi>® W>|«b) 

|dy)|í>2>|^)|«ò) \d Y )\p2) ® U 2 \4>)\q ) (174) 
5.2.3 Proof of the Quantum Bit Commitment Theorem 

Since an EPR cheating strategy can always be applied without detection, the proof of 
the quantum bit commitment theorem assumes that at the end of the commitment stage 
the composite system consisting of Alice's ancillas, the n channel partides, and Bob's 
ancillas will be represented by some composite entangled state |0) or |1), depending 
on Alice's commitment, 12 on a Hilbert space Ha®Hb, where Ha is the Hilbert space 
of the partides in Alice's possession at that stage (Alice's ancillas and the channel 
partides retained by Alice, if any), and Hb is the Hilbert space of the partides in 
Bob's possession at that stage (Bob's ancillas and the channel partides retained by 
Bob, if any). 

Now, the density operators Wb(0) and Wb(1), characterizing the information 
available to Bob for the two alternative commitments, are obtained by tracing the states 
|0) and |1) over Ha- If these density operators are the same, then Bob will be unable 
to distinguish the 0-state from the 1 -state without further information from Alice. In 
this case, the protocol is said to be 'concealing.' What the proof establishes, by an ap- 
plication of the biorthogonal decomposition theorem, is that if Wb{0) = Wb(1) then 
there exists a unitary transformation in Ha that will transform |0) to 1 1) . That is, if the 
protocol is 'concealing' then it cannot be 'binding' on Alice: she can always follow the 
protocol (with appropriate substitutions of an EPR strategy) to establish the state |0). 
At the final stage when she is required to reveal her commitment, she can choose to 
reveal the alternative commitment, depending on circumstances, by applying a suitable 
unitary transformation in her own Hilbert space to transform |0) to |1) without Bob 
being able to detect this move. So either Bob can cheat by obtaining some information 
about Alice's choice before she reveals her commitment, or Alice can cheat. 

The essentials of the proof can be sketched as follows: In the Schmidt decomposi- 

12 More precisely, depending on whether Alice intends to reveal or 1 — since we are assuming that Alice 
will apply an EPR cheating strategy whenever this is relevant. 
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tion, the states |0) and |1) can be expressed as: 

10) - YlVPi\ai)\bi) 

i 

1 1 ) = E^K>^> ( 175 > 

where {|ai)}, { | a ^ ) } are two orthonormal sets of states in Ha, and are 
two orthonormal sets in Hb ■ 

The density operators Wb{0) and Ws(l) are defined by: 

W B {0) = Tr A \0)(0\ = ^PíI&íXM 

i 

W B (l) = Tr A \l)(l\ = J2p'M)%\ ( 176 > 

j 

Bob can't cheat if and only if Wb{0) = Wb (1)- Now, by the spectral theorem, the 
decompositions: 

w B (o) = ^kI^X^I 

i 
3 

are unique for the nondegenerate case, where the pi are all distinct and the p'j are all 
distinct. The condition Wb(0) — W B (1) implies that for all k: 

Pi = Pi 

\h) = \b[) (177) 

and so 

10) = J2vp~^)h) 

1 

|1> = YlVFi\<)\h) (178) 

i 

It follows that there exists a unitary transformation U € Ha such that 

{k>}-^{|4>} (179) 

and hence 

|0> |1) (180) 



As we shall see in ^15 .2.4-1 instead of transforming |0) to |1) by a unitary transfor- 
mation, Alice could achieve the same effect by preparing the state |0) and measuring 
in either of two bases, depending on whether she intends to reveal or 1 . 
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The degenerate case can be handled in a similar way. Suppose that pi = P2 = 
p[ = p' 2 = p. Then |òi), I&2) and \b[), \b' 2 ) span the same subspace H in Hb, and 
hence (assuming the coefficients are distinct for k > 2): 

|0> = Vp(\a 1 )\b 1 ) + \a 2 }\b 2 )) + Y,VPÏWk)\b k } 

k>2 

|i> = Vp(K)I&í) + I4)|6 2 » + Ev^I4)IM 

fe>2 

= ^p(\a'{)\h) + \a'é)\b 2 )) + J2VPk-K)\h) (181) 

fc>2 

where \a'{), \a 2 ) are orthonormal states spanning H. Since {la"), \a 2 ), \as), . . .} is an 
orthonormal set in Ha, there exists a unitary transformation in Ha that transforms 
{|a fc ); fc = 1, 2, 3, ■ ■ ■} to {\a'{), |a' 2 '), |a£>, . . .}, and hence |0) to 

The extension of the theorem to the nonideal case, where Wb(0) ~ Wb(1), so 
that there is a small probability that Bob could distinguish the alternative commitments, 
shows that Alice has a correspondingly large probability of cheating successfully: there 
exists a unitary transformation U in Ha that will transform Wb (0) sufficiently close 
to Ws(l)so that Alice can reveal whichever commitment she chooses, with a corre- 
sponding small probability of Bob being able to detect this move. 



5.2.4 How the Theorem Works: An Example 

The following example by Asher Peres (private communication) is a beautiful illustra- 
tion of how the theorem works. (My analysis of the example owes much to correspon- 
dence with Adrian Kent and Dominic Mayers.) 

Suppose Alice is required to send Bob a channel particle C in an equal weight 
mixture of the qubit states: 

|c ) - |0) (182) 
|c 2 ) = ~|0> + ^|1) (183) 

|c 4 > = ~|0>-^|1) (184) 
if she commits to 0, and an equal weight mixture of the qubit states: 

|ci) = |1) (185) 

|C3> = ^|0)-^|1) (186) 

|c B > = ~|0>-^|1) (187) 
if she commits to 1 . Note that these two mixtures have the same density operator: 

Po = Pi= 1/2 (188) 
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Suppose Alice tries to implement an EPR cheating strategy by preparing the entan- 
gled state of a system AC: 

|0) = ^=(\ao)\c ) + \a 2 )\c 2 ) + \a 4 )\c 4 }) (189) 

where {|a ), |a 2 ), \ ai}} is an orthonormal basis in the 3-dimensional Hilbert space H A 
of a suitable ancilla system A. If Alice could transform the state |0) to the state: 

|l> = ^(M|ci> + |a3>|c3> + k>|c 5 >) (190) 

where {|ai), {a^}, \a^}} is another orthonormal basis in H A , by a local unitary trans- 
formation in H A , she could delay her commitment to the opening stage. If, at that stage, 
she decides to commit to 0, she measures the observable with eigenstates {|ao), \a 2 ), | a.4) }. 
If she decides to commit to 1, she performs the local unitary transformation taking the 
state |0) to the state |1) and measures the observable with eigenstates {|ai), |a 3 ), |a 5 )}. 
Now, |0) can be expressed as: 

m\ 1 (\ \]c3)Hç5> . ■ |ci) - |c 3 ) . , JC5) ~ \ci)\ nQ1 , 

10) = _^| ao )_^_ + |a 3 >— ^- + 1^— ^-j (191) 

= ^(^^^ + ^^^ + ^^1^)) d92) 

In this representation of |0>, the factor states |a2> ^ a4> , |a ° > v ^ a2> , |(t4> ^l ao> in W A are 
not orthogonal — in fact, they are coplanar: 

\ao) - \a 2 ) - -(\a 2 ) - |a 4 )) - (|o 4 ) - \a ) (193) 

So it seems that there cannot be a suitable unitary transformation that will map |0) to 
1 1) and the EPR strategy is blocked! 

Of course, this is not the case. To see that there is such a unitary transformation, 
note that |0) and |1) can be expressed in the Schmidt decomposition as: 

10) = ^( ^-^-l^ l^ + l^^lco) (194) 

U) = J-(\^^\ Co) + z3M+M+M lci) ) (195) 

v2 V v2 V6 / 

Clearly, now, there exists a unitary transformation U in H A such that: 

|0) |1> (196) 

It follows that: 

{|ao), \a 2 ), |a 4 )} ^ {1^), |a' 2 >, |^)} (197) 
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where {\a' Q ), \a' 2 ), W4)} is a basis in H A , and so 

|1) = -^(\a' )\c ) + \a' 2 )\c 2 ) + \a' 4 )\c 4 )) (198) 

= -^(|ai>jci> + |o 3 >|c3> + |a B >|c B » (199) 

So Alice could implement the EPR cheating strategy by preparing the state |1) 
and measuring in the basis {|oq), \a' 2 ), W4)} for the O-commitment, or in the basis 
|«3) , la.5) } for the 1-commitment. Equivalently, of course, she could prepare the 
state |0) and measure in two different bases, since the unitary transformation that takes 
|1) to |0) also takes the basis {|ai), |a 3 ), \a 5 )} to the basis {\a'{}, |a 3 '), l a 5)}> an d so: 

10} - -^=(|ao)|co) + |a 2 )| C2 ) + |a 4 )|c4}) (200) 

= -L(|a'/)| Cl ) + |4')| C 3) + |a' 5 ')| C 5)) (201) 

A calculation shows that: 

K) = 1 (|ao> + (l + V3)|a 2 > + (l-V3)|a 4 )) (202) 

K> = l ((1 + >/3)|ao) + (1 - \/3)|a 2 ) + |o 4 )) (203) 



K) = l (l - Vs)\a ) + \a 2 ) + (1 + V^)|a 4 >) 



(204) 



In effect, if Alice prepares the entangled state |0) and measures the ancilla A in the 
{| a o) ; |a 2 ), | ft 4)} basis, she steers the channel particle into a mixture of nonorthogonal 
states {|co}, |c 2 ), \c^)}. If she measures in the {\a'{), Wé), W5)} basis, she steers the 
channel particle into a mixture of nonorthogonal states {|ci), IC3), |c 5 }}. 

It follows that Alice can implement the EPR cheating strategy without performing 
any unitary transformation — she simply entangles the channel particle with a suitable 
ancilla particle and performs one of two measurements at the opening stage, depending 
on her commitment. This shows that the unitary transformation required by the theorem 
is not in fact required. If a cheating strategy is possible in which Alice, at the open- 
ing stage, either makes a measurement on an entangled state for the 0-commitment, 
or transforms this entangled state to a different state by a local unitary transformation 
in her Hilbert space and then makes a measurement on the transformed state for the 
1-commitment, then an equally good cheating strategy is available in which Alice pre- 
pares one entangled state for both commitments, and measures in two alternative bases 
at the opening stage, depending on her commitment. 



5.2.5 A Final Worry Laid to Rest 

The heart of the mathematical proof is the Schmidt decomposition theorem. But the 
essential conceptual insight is the possibility of enlarging the Hilbert space and imple- 
menting an EPR strategy without detection. 
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This raises the following question: Suppose Bob cannot cheat because Wb(0) = 
Wb (1), so by the theorem there exists a unitary transformation U in Ha that will trans- 
form |0) to Could there be a protocol in which Alice also cannot cheat because, 
although there exists a suitable unitary transformation U, she cannot know what uni- 
tary transformation to apply? This is indeed the case, but only if U depends on Bob's 
operations, which are unknown to Alice. But then Bob would have to actually make 
a definite choice or obtain a definite outcome in a measurement, and he could always 
avoid doing so without detection by applying an EPR strategy. 

This raises a further question: How do we know that following an EPR strategy 
is never disadvantageous to the cheater? If so, Bob might choose to avoid an EPR 
strategy in a certain situation because it would be disadvantageous to him. Could there 
be a bit commitment protocol where the application of an EPR strategy by Bob at a 
certain stage of the protocol would give Alice the advantage, rather than Bob, while 
conforming to the protocol would ensure that neither party could cheat? If there were 
such a protocol, then Bob would, in effect, be forced to conform to the protocol and 
avoid the EPR strategy, and unconditionally secure bit commitment would be possible. 

In fact, the impossibility of such a protocol follows from the theorem (see [ Bub, 200 la| . 
Suppose there were such a protocol. That is, suppose that if Bob applies an EPR strat- 
egy then Wb{0) = Wb(1), so by the theorem there exists a unitary transformation 
U in Alice's Hilbert space that will transform |0) to |1). Alice must know this U be- 
cause it is uniquely determined by Bob's deviation from the protocol according to an 
EPR strategy that keeps all disjunctions at the quantum level as linear superpositions. 
Suppose also that if, instead, Bob is honest and follows the protocol (so that there is a 
definite choice for every disjunction over possible operations or possible measurement 
outcomes), then Wb(0) — Ws(l), but the unitary transformation in Alice's Hilbert 
space that allows her to transform |0) to |1) depends on Bob's choices or measurement 
outcomes, which are unknown to Alice. 

The point to note is that the information available in Alice's Hilbert space must be 
the same whether Bob follows the protocol and makes determinate choices and obtains 
determinate measurement outcomes before Alice applies the unitary transformation U 
that transforms |0) to 1 1), or whether he deviates from the protocol via an EPR strategy 
in which he implements corresponding entanglements with his ancillas to keep choices 
and measurement outcomes at the quantum level before Alice applies the transforma- 
tion U, and only makes these choices and measurement outcomes definite at the final 
stage of the protocol by measuring his ancillas. There can be no difference for Alice 
because Bob's measurements on his ancillas and any measurements or operations that 
Alice might perform take place in different Hilbert spaces, so the operations commute. 
If Alice's density operator (obtained by tracing over Bob's ancillas), which character- 
izes the statistics of measurements that Alice can perform in her part of the universe, 
were different depending on whether or not Bob actually carried out the required mea- 
surements, as opposed to keeping the alternatives at the quantum level by implement- 
ing corresponding entanglements with ancillas, then it would be possible to use this 
difference to signal superluminally. Actual measurements by Bob on his ancillas that 
selected alternatives in the entanglements as determinate would instantaneously alter 
the information available in Alice's part of the universe. 

It follows that in the hypothetical bit commitment protocol we are considering, the 
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unitary transformation U in Alice's Hilbert space that transforms |0) to |1) must be 
the same transformation in the honest scenario as in the cheating scenario. But we 
are assuming that the transformation in the honest scenario is unknown to Alice and 
depends on Bob's measurement outcomes, while the transformation in the cheating 
scenario is unique and known to Alice. So there can be no such protocol: the deviation 
from the protocol by an EPR strategy can never place Bob in a worse position than 
following the protocol honestly. 

The argument can be put formally in terms of the theorem as follows: The cheating 
scenario produces one of two alternative pure states |0) c or |l) c in Ha ® Hb ('c' for 
'cheating strategy). Since the reduced density operators in Hb' 

W B C) (0) = Tr A \0)(0\ c 

W B c \l) = Tr A \l)(l\ c (205) 
are required by assumption to be the same: 

W B c) (0) = W { B c \í) (206) 
the states |0) c and |l) c can be expressed in biorthogonal decomposition as: 

|0)c - ^2VPÍ\ai){bi\ 

i 

|l)c = Ç^tèX&il (207) 

i 

where the reduced density operators in Ha- 

W A c) (0) =Tr B \0)(0\ c = ÇpíIoíXoíI 

W^ c) (l)=Tr B |l)(l| c = ÇpiloíXoíl (208) 

i 

are different: 

W A c) (0) ^ W ( A c) (1) (209) 

It follows that there exists a unitary operator U c G Ha defined by the spectral 
representations of W A c) (0) and W A c) (1): 

i\a,)} ^ {\a'i}} (210) 

such that: 

|0) c -^|l)c (211) 

The honest scenario produces one of two alternative pure states \0)h and \l)h in 
Ha®Hb ('h' for 'honest'), where the pair {\0)h, depends on Bob's choices and 
the outcomes of his measurements. 
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By assumption, as in the cheating scenario, the reduced density operators W B ' (0) 
and (1) in Hb are the same: 

W B h \o) = W B h) (í) (212) 
which entails the existence of a unitary operator Uh € Ha such that: 

|0) fc ^ \l) h (213) 

where Uh depends on Bob's choices and measurement outcomes. 

Now, the difference between the honest scenario and the cheating scenario is unde- 
tectable in Ha, which means that the reduced density operators in Ha are the same in 
the honest scenario as in the cheating scenario: 

W A h) (0) = W A c) (0) 

W A h \l) = W ( A C) (1) (214) 

Since Uh is defined by the spectral representations of W A ^ (0) and W A \l), it follows 
that Uh = U c . But we are assuming that Uh depends on Bob's choices and measure- 
ment outcomes, while U c is uniquely defined by Bob's EPR strategy, in which there 
are no determinate choices or measurement outcomes. Conclusion: there can be no bit 
commitment protocol in which neither Alice nor Bob can cheat if Bob honestly follows 
the protocol, but Alice can cheat if Bob deviates from the protocol via an EPR strategy. 
If neither Bob nor Alice can cheat in the honest scenario, then Bob and not Alice must 
be able to cheat in the cheating scenario. 

A similar argument rules out a protocol in which neither party can cheat if Bob is 
honest (as above), but if Bob follows an EPR strategy, then Wb(0) w lfg(l), so Bob 
has some probability of cheating successfully, but Alice has a greater probability of 
cheating successfully than Bob. Again, the unitary transformation U c that would allow 
Alice to cheat with a certain probability of success if Bob followed an EPR strategy 
would also have to allow Alice to cheat successfully if Bob were honest. But the suppo- 
sition is that Alice cannot cheat if Bob is honest, because the unitary transformation Uh 
in that case depends on Bob's choices and measurement outcomes, which are unknown 
to Alice. It follows that there can be no such protocol. 

So there is no loophole in the proof of the theorem. Unconditionally secure quan- 
tum bit commitment (in the sense of the theorem) really is impossible. 



6 Quantum Computation 

6.1 The Church-Turing Thesis and Computational Complexity 

The classical theory of computation concerns the question of what can be computed, 
and how efficiently. 

Various formal notions of computability by Alonzo Church, Kurt Gòdel, and others 
can all be shown to be equivalent to Alan Turing's notion of computability by a Turing 
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machine (see, e.g., |Lewis and Papadimitriou, 1981 1). A Turing machine is an abstract 
computational device that can be in one of a finite set of possible states. It has a 
potentially infinite tape of consecutive celis to store information (0, 1, orblank in each 
celi) and a movable tape head that reads the information in a celi. Depending on the 
symbol in a celi and the state of the machine, the tape overwrites the symbol, changes 
the state, and moves one celi to the right or the left until it finally halts at the completion 
of the computation. A program for a Turing machine T (e.g., a program that executes 
a particular algorithm for finding the prime factors of an integer) is a finite string of 
symbols — which can be expressed as a binary number b(T) — indicating, for each state 
and each symbol, a new state, new symbol, and head displacement. Turing showed that 
there exists a universal Turing machine U that can simulate the program of any Turing 
machine T with at most a polynomial slow-down, i.e., if we initialize U with b(T) 
and the input to T, then U performs the same computation as T, where the number 
of steps taken by U to simulate each step of T is a polynomial function of b(T). The 
Church-Turing thesis is the proposal to identify the class of computable functions with 
the class of functions computable by a universal Turing machine. Equivalently, one 
could formulate the Church-Turing thesis in terms of decision problems, which have 
yes-or-no answers (e.g., the problem of determining whether a given number is a prime 
number). 

Intuitively, some computations are harder than others, and some algorithms take 
more time than others. The computational complexity of an algorithm is measured by 
the number of steps required by a Turing machine to run through the algorithm. A 
decision problem is said to be in complexity class P, hence easy or tractable if there 
exists an algorithm for solving the problem in polynomial time, i.e., in a number of 
steps that is a polynomial function of the size n of the input (the number of bits required 
to store the input). A problem is said to be hard or intractable if there does not exist a 
polynomial-time algorithm for solving the problem. A problem is in complexity class 
EXP if the most efficient algorithm requires a number of steps that is an exponential 
function of the size n of the input. The number of steps here refers to the worst-case 
running time, r, which is of the order (D(n k ) for a polynomial-time algorithm and of 
the order 0(2 n ) for an exponential-time algorithm. 

Note that an exponential-time algorithm could be more efficient than a polynomial- 
time algorithm for some range of input sizes, so the above terminology should be un- 
derstood with caution. Consider the following example (taken from |Ba renco, 1998| 
p. 145]): T P {n) = l(r 23 n 1000 + 10 23 /n ps O(n 100 °) because, for sufficiently large 
n, the polynomial term dominates (i.e., rp(n) < cn 1000 for a fixed factor c), and 
TE (n) = 10 23 n 1000 + l(T 23 2 n ps 0(2 n ) because, for sufficiently large n, the expo- 
nential term dominates (i.e., te{ti) < c2 n for a fixed factor c). But for small enough 
vàlues of n, te(ti) < rp(n). 

A Turing machine as defined above is a deterministic machine. A nondeterministic 
or probabilistic Turing machine makes a random choice between múltiple transitions 
(to a new symbol, new state, and head displacement) for each symbol and each state. 
For each sequence of choices, the sequence of transitions corresponds to a sequence of 
steps executed by a deterministic Turing machine. If any of these machines halts, the 
computation is regarded as completed. Evidently, a nondeterministic Turing machine 
cannot compute a function that is not computable by a deterministic Turing machine, 
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but it is believed (but not proved) that certain problems can be solved more efficiently 
by nondeterministic Turing machines than by any deterministic Turing machine. The 
complexity class NP is the class of problems that can be solved in polynomial time 
by a nondeterministic Turing machine. This is equivalent to the class of problems for 
which proposed solutions can be verified in polynomial time by a deterministic Turing 
machine. For example, it is believed (but not proved) that the problem of factoring an 
integer into its prime factors is a 'hard' problem: there is no known polynomial-time 
algorithm for this problem. However, the problem of checking whether a candidate fac- 
tor of an integer is indeed a factor can be solved in polynomial time, so factorizability 
is an NP problem. 

Clearly P Ç NP, but it is an open problem in complexity theory whether P = NP. 
An NP problem is said to be NP-complete if every NP problem can be shown to have 
a solution with a number of steps that is a polynomial function of the number of steps 
required to solve the NP-complete problem. So if an NP-complete problem can be 
solved in polynomial time, then all NP problems can be solved in polynomial time, 
and P = NP. The problem of determining whether a Boolean function /{Q, 1}" — » 
{0, 1} is satisfiable (i.e., whether there is a set of input vàlues for which the function 
takes the value 1, or equivalently whether there is an assignment of truth vàlues to the 
atòmic sentences of a compound sentence of Boolean lògic under which the compound 
sentence comes out true) is an NP-complete problem. Factorizability is an NP problem 
that is not known to be NP-complete. 

Since a Turing machine can simulate any classical computing device with at most a 
polynomial slow-down, the complexity classes are the same for any model of computa- 
tion. For example, a circuit computer computes the value of a function by transforming 
data stored in an input register, representing the input to the function, via Boolean 
circuits constructed of elementary Boolean gates connected by wires, to data in an out- 
put register representing the value of the function computed. The elementary Boolean 
gates are 1 -bit gates (such as the NOT gate, which transforms to 1, and conversely) 
and 2-bit gates (such as the AND gate, which takes two input bits to 1 if and only if 
they are both 1, otherwise to 0), and it can be shown that a combination of such gates 
forms a 'universal set' that suffices for any transformation of n bits. In fact, it turns 
out that one of the sixteen possible 2-bit Boolean gates, the NAND gate (or NOT AND 
gate), which takes two input bits to of and only if they are both 1, forms a universal 
set by itself. 

In a circuit model of a quantum computer, the registers store qubits, which are then 
manipulated by elementary unitary gates. It can be shown (see | Nielsen and Chuang, 2000| 
p. 188]) that a set of single-qubit and two-qubit unitary gates — the CNOT gate, the 
Hadamard gate, the phase gate, and the ir/8 gate — forms a universal set, in the sense 
that any unitary transformation of n qubits can be approximated to arbitrary accu- 
racy by a quantum circuit consisting of these gates connected in some combination. 
The CNOT gate ('C for 'controlled) has two input qubits, a 'control' qubit and a 
'target' qubit. The gate functions so as to flip the target qubit if and only if the con- 
trol qubit is |1). The remaining three gates are single-qubit gates. The Hadamard 
gate transforms |0) to (|0) + \\))/y/2 and |1) to (|0) - |1))/V2 and is sometimes 
referred to as the 'square root of NOT' gate because two successive applications trans- 
forms |0) to |1), and conversely. The phase gate leaves |0) unchanged and trans- 
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forms |1) to The 7r/8 gate leaves |0) unchanged and transforms |1) to e l7r / 4 |l). 
(See |Nielsen and Chuang, 2000| p. 174] for a discussion and why the 7r/8 gate is so 
named.) 

There are other models of quantum computation. In the 'cluster state' or 'one- 
way' quantum computer of Raussendorf and Briegel 1 2001b 2001a|, a fixed multi- 
qubit entangled state (called a 'cluster state'), independent of the computation, is pre- 
pared. Then a sequence of single-qubit measurements is performed on this state, 
where the choice of what observables to measure depends on the outcomes of the 
previous measurements. No unitary transformations are involved. Remarkably, it 
can be shown that any quantum circuit of unitary gates and measurements can be 
simulated by a cluster state computer with similar resources of qubits and time (see 
IJozsa, 2005||Nielsen, 20031 INielsen, 20051 ). 

The interesting question is whether a quantum computer can perform computa- 
tional tasks that are not possible for a Turing machine, or perform such tasks more 
efficiently than any Turing machine. Since a Turing machine is defined by its program, 
and a program can be specified by a finite set of symbols, there are only countably many 
Turing machines. There are uncountably many functions on the natural numbers, so 
there are uncountably many uncomputable functions, i.e., functions that are not com- 
putable by any Turing machine. A quantum computer cannot compute a function that 
is not Turing-computable, because a Turing machine can simulate (albeit inefficiently, 
with an exponential slow-down |Feynman, 1982 1) the dynamical evolution of any sys- 
tem, classical or quantum, with arbitrary accuracy. But there are computational tasks 
that a quantum computer can perform by exploiting entanglement that are impossible 
for any Turing machine. Recali the discussion of Bell's counterargument to the EPR 
argument in ^33 .1.11 a quantum computer, but no classical computer, can perform the 
task of rapidly producing pairs of vàlues (0 or 1) for pairs of input angles at different 
locations, with correlations that violate Bell's inequality, where the response time is 
less than the time taken by light to travel between the locations. 

The current interest in quantum computers concerns the question of whether a 
quantum computer can compute certain Turing-computable functions more efficiently 
than any Turing machine. In the following section, I discuss quantum algorithms that 
achieve an exponential speed-up over any classical algorithm, or an exponential speed- 
up over any known classical algorithm. The most spectacular of these is Shor's factor- 
ization algorithm, and a related algorithm for solving the discrete log problem. 13 

The factorization algorithm has an important practical application to cryptogra- 
phy. Public-key distribution protocols such as RS A | Rivest et al, 197 8 1 (widely used 
in commercial transactions over the internet, transactions between banks and financial 
institutions, etc.) rely on factoring being a 'hard' problem. (Preskill 1 2005 1 notes that 
currently the 65-digit factors of a 1 30-digit integer can be found in about a month using 
a network of hundreds of work stations implementing the best known classical factor- 
ing algorithm (the 'number sieve algorithm'). He estimates that factoring a 400-digit 
integer would take about 10 10 years, which is the age of the universe.) To see the idea 
behind the RSA protocol, suppose Alice wishes to send a secret message to Bob. Bob's 

I3 The discrete log of x with respect to a given prime integer p and an integer q coprime to p is the integer 
r such that q r = x mod p. See | Nielsen and Chuang, 2000 p. 238] for a discussion. 
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públic key consists of two large integers, s and c. Alice encrypts the message m (in 
the form of a binary number) as e = m s mod c and sends the encrypted message to 
Bob. Bob decrypts the message as é mod c where t is an integer known only to Bob. 
The integer t for which m — e* mod c can easily be determined from s and the fac- 
tors of c, but since c = pq is the product of two large prime numbers known only to 
Bob, an eavesdropper, Eve, can read the message only if she can factor c into its prime 
factors. The cleverness of the scheme resides in the fact that no secret key needs to 
be distributed between Alice and Bob: Bob's key {s, c} is públic and allows anyone 
to send encrypted messages to Bob. If a quantum computer could be constructed that 
implemented Shor's algorithm, key distribution protocols that rely on the difficulty of 
factoring very large numbers would be insecure. 

6.2 Quantum Algorithms 

In the following three sections, I look at the information-processing involved in Deutsch's 
XOR algorithm 1 1985], Simon's period-finding algorithm 1I1994| |1997 1, and Shor's fac- 
torization algorithm 1 1994l ll997l in terms of the difference between the Boolean lògic 
underlying a classical computation and the non-Boolean lògic represented by the pro- 
jective geometry of Hilbert space, in which the subspace structure of Hilbert space 
replaces the set-theoretic structure of classical lògic. The three algorithms all turn out 
to involve a similar geomètric formulation. 

Basically, all three algorithms involve the determination of a global property of 
a function, i.e., a disjunctive property. The disjunction is represented as a subspace 
in an appropriate Hilbert space, and alternative possible disjunctions turn out to be 
represented as orthogonal subspaces, except for intersections or overlaps. The true dis- 
junction is determined as the subspace containing the state vector via a measurement. 
The algorithm generally has to be run several times because the state might be found in 
the overlap region. The essential feature of these quantum computations is that the true 
disjunction is distinguished from alternative disjunctions without determining the truth 
vàlues of the disjuncts. In a classical computation, distinguishing the true disjunction 
would be impossible without the prior determination of the truth vàlues of the disjuncts. 
More generally, a quantum computer computes a global property of a function with- 
out computing information that is redundant quantum mechanically, but essential for a 
classical computation of the global property. 

There are other quantum algorithms besides these three, e.g., Grover's sorting al- 
gorithm 1 1997| which achieves a quadratic speed-up over any classical algorithm. For 
adiscussion, see |N ielsen and Chuang, 2 000 1, [Jozsa, 1999 1. 

6.2.1 Deutsch's XOR Algorithm and the Deutsch- Jozsa Algorithm 

Let B = {0, 1} be a Boolean àlgebra (or the additive group of integers mod 2). In 
Deutsch's XOR problem 1 1985], we are given a 'black box' or oracle that computes 
a function / : B — > B and we are required to determine whether the function is 
'constant' (takes the same value for both inputs) or "balanced' (takes a different value 
for each input). Classically, the only way to do this would be to consult the oracle 
twice, for the input vàlues and 1, and compare the outputs. 
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In a quantum computation of the Boolean function, a unitary transformation Uf : 
\x)\y) — > \x)\y © f(x)} corresponding to the 'blackbox' correlates input vàlues with 
corresponding output vàlues. 14 The computation proceeds as follows: The input and 
output registers are 1-qubit registers initialized to the state |0) |0) in a Standard basis. A 
Hadamard transformation is applied to the input register, yielding a linear superposition 
of states corresponding to the two possible input vàlues and 1, and the transformation 
Uf is then applied to both registers, yielding the transitions: 



H 



^(|0) + |1))|0) (215) 

U -i -L(| )|/(0)> + |1>|/(1)>) (216) 

If the function is constant, the final composite state of both registers is one of the 
two orthogonal states: 

|ci) = ^=(|0)|0) + |1)|0» (217) 

|c 2 ) = -J=(|0)|1> + |1>|1» (218) 

If the function is balanced, the final composite state is one of the two orthogonal states: 

|6i> = ^(|0)|0) + |1)|1)) (219) 

Ifc) = ^(|0)|1) + |1)|0)) (220) 

The states |ci), \c2) and |òi), I&2) span two planes P c , P b in H 2 <g> Tí 2 , represented 
by the projection operators: 

Pc = P\ Cl) +P\ C2) (221) 
P b = P\ bl) +P\ b2) (222) 

These planes are orthogonal, except for an intersection, so their projection operators 
commute. The intersection is the line (ray) spanned by the vector 15 : 

|(|00> + |01) + |10) + |11» = -^(| Cl ) + Ica)) = ^=(\h) + \b 2 )) (223) 

In the 'prime' basis spanned by the states |0') = H\0), |1') = H\\) the intersec- 
tion is the state |0')|0'), the 'constant' plane is spanned by |0')|0'), |0')|1'), and the 

14 Note that two quantum registers are required to compute functions that are not 1-1 by a unitary trans- 
formation. Different input vàlues, x and y, to a function / are represented by orthogonal states \x), \y). So 
if f(x) = f(y) for some x ^ y, the transformation Wf : \x) — » \f(x)) could not be unitary, because 
the orthogonal states |x), \y) would have to be mapped onto the same state by Wf. The ability of unitary 
transformations, which are reversible, to compute irreversible functions is achieved by keeping a record of 
the input for each output value of the function. 

15 Here |00> = |0>|0>, etc. 
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'balanced' plane is spannedby |0')|0'), |1')|1'). Note that: 

|0')|1') = ^(| Cl )-|c 2 )) (224) 
= ^(|òi)-|M) (225) 

In the usual formulation of the algorithm, to decide whether the function / is con- 
stant or balanced we measure the output register in the prime basis. If the outcome is 0' 
(which is obtained with probability 1/2, whether the state ends up in the constant plane 
or the balanced plane), the computation is inconclusive, yielding no information about 
the function /. If the outcome is 1', then we measure the input register. If the outcome 
of the measure ment on the input register is 0', the function is constant; if it is 1', the 
function is balanced. 

Alternatively — and this will be relevant for the comparison with Simon's algorithm 
and Shor's algorithm — we could measure the observable with eigenstates |0'0'), |0'1'), 
|1'0'), The final state is in the 3-dimensional subspace orthogonal to the vector 

|1'0'), either in the constant plane or the balanced plane. If the state is in the con- 
stant plane, we will either obtain the outcome 0'0' with probability 1/2 (since the final 
state is at an angle 7r/4 to |0'0')), in which case the computation is inconclusive, or 
the outcome 0'1' with probability 1/2. If the state is in the balanced plane, we will 
again obtain the outcome 0'0' with probability 1/2, in which case the computation is 
inconclusive, or the outcome l'I' with probability 1/2. So in either case, with prob- 
ability 1/2, we can distinguish in one run of the algorithm between the two quantum 
disjunctions 'constant' and 'balanced' represented by the planes: 

P c = Ppv) V P m .) (226) 
P b = P\ vv) VP| W) (227) 

without finding out the truth vàlues of the disjuncts in the computation (i.e., whether 
in the 'constant' case the function maps to and 1 to or whether the function 
maps to 1 and 1 to 1, and similarly in the 'balanced' case). Note that we could also 
apply a Hadamard transformation to the final states of both registers and measure in 

the computational basis, since |0'0') |00),etc. 

Deutsch's XOR algorithm was the first quantum algorithm with a demonstrated 
speed-up over any classical algorithm performing the same computational task. How- 
ever, the algorithm has an even probability of failing, so the improvement in effi- 
ciency over a classical computation is only achieved if the algorithm succeeds, and 
even then is rather modest: one run of the quantum algorithm versus two runs of a 
classical algorithm. The following variation of Deutsch's algorithm avoids this feature 
ICleve et al., Ï9981 . 

We begin by initializing the two registers to |0) and |1), respectively (instead of to 
|0) and |0)) and apply a Hadamard transformation to both registers, which yields the 
transition: 

|0)|1) l '°> + I" l°> H'> <22 8 > 
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Since 



Uf\x)\y) 



\x)\y®f{x)) 



(229) 



it follows that 



|0> |i> 
V2 



| x) Mdl> if /(x) = 



U f \x) 




(230) 



which can be expressed as 



Uf\x) 



|Q)-|i) 

V2 



|Q> — |i> 

v/2 



(231) 



Notice that the value of the function now appears as a phase of the final state of the input 
register, a feature referred to as 'phase kickback.' For the input state 1/^2(10) + |1)), 
we have: 



Instead of the final state of the two registers ending up as one of two orthogonal 
states in the constant plane, or as one of two orthogonal states in the balanced plane, the 
final state now ends up as ±|0'1') in the constant plane, or as ±|1'1') in the balanced 
plane, and these states can be distinguished because they are orthogonal. So we can 
decide with certainty whether the function is constant or balanced after only one run of 
the algorithm. In fact, we can distinguish these two possibilities by simply measuring 
the input register in the prime basis. Note that if we perform a final Hadamard transfor- 
mation on the input register (which takes |0') to |0) and |1') to |1)), we can distinguish 
the two possibilities by measuring the input register in the computational basis. Note 
also that the state of the output register is unchanged: at the end of the process it is in 
the state = H\í) (as in ( 1228» and is not measured. 

Deutsch's XOR problem can be generalized to the problem ('Deutsch's problem') 
of determining whether a Boolean function / : B n — > B is constant or whether it is 
balanced, where it is promised that the function is either constant or balanced. 'Bal- 
anced' here means that the function takes the vàlues and 1 an equal number of times, 
i.e., times each. The Deutsch-Jozsa algorithm 1 1992 1 decides whether / is con- 
stant or balanced in one run. 

We begin by setting the input n-qubit register to the state 1 0) (an abbreviation for the 
state 1 - • - 0) = 1 0) • • - |0)) and the output 1-qubit register to the state |1). We apply an 
n-fold Hadamard transformation to the input register and a Hadamard transformation 
to the output register, followed by the unitary transformation Uf to both registers, and 
finally an n-fold Hadamard transformation to the input register. 



Uf 




(232) 



which can be expressed as: 




(233) 
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First note that 



so 



H \ x ) = 4= £ (-l)^lï/> (234) 
V ae{o,i} 

^"1^,...,^) = -^ £ (-ir yi+ - + ^|yi,...,yn) (235) 

This can be expressed as: 

ff®» = -L 2 (-ir y ly) (236) 

V ye{oa} 

where x • y is the bitwise inner product of x and ?/, mod 2. 

The unitary transformations (Hadamard transformation, Uf) yield: 

io>-ii> ^ £ < 237 > 

xe{o,i}" V V 
ü, E MÜ|,H (2 38, 

^ . /on 1 ' . /o 



/2" 



Now consider the state of the input register: 

Note that the coefficient (amplitude) of the state |0 . . . 0) in the linear superposition 
J240> is (~V 2 ^ . If / is constant, this coefficient is ±1, so the coefficients of the 
other terms must all be 0. If / is balanced, f(x) — for half the vàlues of x and 
f(x) = 1 for the other half, so the positive and negative contributions to the coefficient 
of |0 . . . 0) cancel to 0. In other words, if / is constant, the state of the input register is 
±|0 . . . 0); if / is balanced, the state is in the orthogonal subspace. 

This is the usual way of describing how the algorithm works, which rather obscures 
the geomètric picture. Consider, for simplicity, the case n = 2. After the transforma- 
tion Uf, but before the final Hadamard transformation, the state of the input register 
is: 

±i(|00> + |01) + |10) + |ll» (241) 
if the function is constant, or a state of the form: 

~(±|00)±|01)±|10)±|11)) (242) 

if the function is balanced, where two of the coefficients are +1 and two of the coeffi- 
cients are — 1. Evidently, there are six such balanced states, and they are all orthogonal 
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to the constant state. So the six balanced states lie in a 3-dimensional subspace orthog- 
onal to the constant state and can therefore be distinguished from the constant state. 
The final Hadamard transformation transforms the constant state: 

± |(]00) + |01) + |10) + |11» ±|00) (243) 

and the six balanced states to states in the 3-dimensional subspace orthogonal to 1 00) . 
So to decide whether the function is constant or balanced we need only measure the 
input register and check whether it is in the state 1 00) . 



6.2.2 Simon's Algorithm 

The problem here is to find the period r of a periòdic function / : B n — > B n , i.e., a 
Boolean function for which 

f( x i) — f( x j) if ar, d on ly if x j — x i © r ' f° r a U x íi x j € B n (244) 

Note that since x © r © r = x, the function is 2-to- 1 . 

Simon's algorithm solves the problem efficiently, with an exponential speed-up 
over any classical algorithm (see | Sim ón, 1994||Simon, 1997 1). The algorithm pro- 
ceeds as in the Deutsch-Jozsa algorithm, starting with the input and output registers in 
the state |0 . . . 0) |0) in the computational basis: 

|0...0)|0) -L^^IO) (245) 

u f 



2 



(246) 

■iç^T 4 ^» ,247) 

where U / is the unitary transformation implementing the Boolean function as: 

U f :\x)\y)^\x)\y®f(x)) (248) 

The usual way to see how the algorithm works is to consider what happens if we 
measure the output register and keep the state of the input register, 16 which will have 
the form: 

\xi) + fa®r) (249) 

This state contains the information r, but summed with an unwanted randomly chosen 
offset Xi that depends on the measurement outcome. A direct measurement of the state 
label would yield any x E B n equiprobably, providing no information about r. 

I6 The measurement of the output register here is a pedagogical device for ease of conceptualization. Only 
the input register is actually measured. The input register is in a mixture of states, which we can think of as 
the mixture associated with the distribution of outcomes obtained by measuring the output register. 
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We now apply a Hadamard transform: 



Xj) + \xj r) 
V2 



H 



E 




(250) 



yeB" 



V2 



y:r-y=0 



E 




(251) 



where the last equality follows because terms interfere destructively if r ■ y = 1. Fi- 
nally, we measure the input register in the computational basis and obtain a value y 
(equiprobably) such that r • y = 0. Then we repeat the algorithm sufficiently many 
times to find enough vàlues so that r can be determined by solving the linear equa- 
tions r • y\ = 0, . . . , r ■ yu = 0. 

To see what is going on geometrically, consider the case n = 2. The possible 
vàlues of the period r are: 01, 10, 11, and the corresponding states of the input and 
output registers after the unitary transformation U / are: 



Notice that this case reduces to the same geomètric construction as in Deutsch's 
XOR algorithm. For r = 10 the input register states are |ci) = 1 00} + 1 10} or \cz) = 
|01) + |11), and for r = 11 the input register states are \bi) = |00) + 1 11) or \b 2 ) = 
|01) + |10), depending on the outcome of the measurement of the output register. So the 
three possible periods are associated with three planes in Ti 2 £g> TL 2 , which correspond 
to the constant and balanced planes in Deutsch's XOR algorithm, and a third plane, 
all three planes intersecting in the line spanned by the vector |00). In the prime basis 
obtained by applying the Hadamard transformation, the planes are as follows: 

r = 01 : plane spanned by |0'0'), | l'O') 

r = 10 : plane spanned by |0'0'), |0'1') (corresponds to 'constant' plane) 
r = 11 : plane spanned by |0'0'), |1'1') (corresponds to 'balanced' plane) 

We could simply measure the input register in the prime basis to find the period. 
Alternatively, we could apply a Hadamard transformation (which amounts to dropping 
the primes in the above representation of the r-planes) and measure in the computa- 
tional basis. The three planes are orthogonal, except for their intersection in the line 
spanned by the vector |00). The three possible periods can therefore be distinguished 
by measuring the observable with eigenstates 1 00} , |01), 1 10) , 1 11), except when the 
state of the register is projected by the measurement ('collapses') onto the intersection 
state 1 00) (which occurs with probability 1/2). So the algorithm will generally have to 
be repeated until we find an outcome that is not 00. 

The n = 2 case of Simon's algorithm reduces to Deutsch's XOR algorithm. What 
about other cases? We can see what happens in the general case if we consider the case 



r 



r 



r 



01 



10 



11 



(|00) + |01))|/(00)) + (|10) + |ll))|/(10)) 
(|00) + |10))|/(00)) + (|01) + |ll))|/(01)) 
(|00) + |ll))|/(00)) + (|01) + |10))|/(01)) 



67 



n = 3. There are now seven possible periods: 001, 010, 011, 100, 101, 110, 111. For 
the period r = 001, the state of the two registers after the unitary transformation Uf is: 

(|000) + |001»|/(000)) + (|010) + |011»|/(010)) 

+ (|100) + |101»|/(100)) + (|110) + |111))|/(110)) (252) 

If we measure the output register, the input register is left in one of four states, depend- 
ing on the outcome of the measurement: 

|000) + |001) = |0W) + |0'l'0') + |lW) + |lW) 
|010) + |011) = |0'0'0') - |0'1'0') + |1'0'0') - |1'1'0') 
|100) + |101) = |0'0'0') + |0'1'0') - |1W) - |1'1'0') 

|iio) + |iii) = |o'o'o')-|o'i'o')-|i'o'o') + |i'i'o') 

Applying a Hadamard transformation amounts to dropping the primes. So if the period 
is r = 001, the state of the input register ends up in the 4-dimensional subspace of 
TL 2 (&TL 2 ® Ti 2 spanned by the vectors: |000), |010), |100), |110). 

A similar analysis applies to the other six possible periods. The corresponding 
subspaces are spanned by the following vectors: 
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001 


|000), 


|010), 


100), |110> 
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010 


|000), 


|001), 


100), |101) 
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011 


|000), 


|011), 


100), |111) 
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100 


|000), 


|001), 


010), |011) 
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101 


|000), 


|010), 


101), |111) 
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110 


|000), 


|001), 


110), |111) 


i = 


111 


|000), 


|011), 


101),|110) 



These subspaces are orthogonal except for intersections in 2-dimensional planes. 
The period can be found by measuring in the computational basis. Repetitions of the 
measurement will eventually yield sufficiently many distinct vàlues to determine in 
which subspace out of the seven possibilities the final state lies. In this case (n = 3), 
it is clear by examining the above list that two vàlues distinct from 000 suffice to 
determine the subspace, and these are just the vàlues j/j for which yi ■ r — 0. Note that 
the subspaces correspond to quantum disjunctions. So determining the period of the 
function by Simon's algorithm amounts to determining which disjunction out of the 
seven alternative disjunctions is true, i.e., which subspace contains the state, without 
determining the truth vàlues of the disjuncts. 
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6.2.3 Shor's Algorithm 



Shor's factorization algorithm exploits the fact that the two prime factors p, q of a 
positive integer N — pq can be found by determining the period of a function f{x) = 
a x mod N, for any a < N which is coprime to N, i.e., has no common factors with 
N other than 1. The period r of f(x) depends on a and N. Once we know the period, 
we can factor N if r is even and a r l 2 ^ — 1 mod N, which will be the case with 
probability greater than 1/2 if a is chosen randomly. (If not, we choose another value 
of a.) The factors of N are the greatest common factors of a r/2 ± 1 and N, which can 
be found in polynomial time by the Euclidean algorithm. (For these number-theoretic 
results, see |Nielsen and Chuang, 2000 Appendix 4].) So the problem of factorizing a 
composite integer N that is the product of two primes reduces to the problem of finding 
the period of a certain periòdic function / : Z s — > Zm, where Z n is the additive group 
of integers mod n (rather than B n , the rt-fold Cartesian product of a Boolean àlgebra 
B, as in Simon's algorithm). Note that f(x + r) = f(x) if x + r < s. The function / 
is periòdic if r divides s exactly, otherwise it is almost periòdic. 

Consider first the general form of the algorithm, as it is usually formulated. We 
begin by initializing the input register (s qubits) to the state |0) G H s and the output 
register (N qubits) to the state |0) G TÍ N . An s-fold Hadamard transformation is ap- 
plied to the input register, followed by the unitary transformation Uf which implements 
the function f(x) = a x mod N: 



H 



1 ^ 

-=^|.t)|0) (253) 

* x=0 

^4 -ig \ X )\0) (254) 
1 5-1 

= ~r zZ l x ) \ x + aX mod N ) (255) 

* S x=0 

Then we measure the output register in the computational basis 17 and obtain a state of 
the following form for the input register: 



(256) 



This will be the case if r divides s exactly. The value Xi is the offset, which depends 
on the outcome i of the measurement of the output register. The sum is taken over 
the vàlues of j for which f(xi + jr) = i. When r does not divide s exactly, the 
analysis is a little more complicated. For a discussion, see |Ba renco, 199 8 p. 164], 
|Jozsa, 1997b| . Since the state label contains the random offset, a direct measurement 
of the label yields no information about the period. 

I7 As in the discussion of Simon's algorithm, this measurement is purely hypothetical, introduced to sim- 
plify the analysis. Only the input register is actually measured. 
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A discrete Fourier transform for the integers mod s is now applied to the input 
register, i.e., a unitary transformation: 

W ff £!f 1 Je«?| f)ifcfí6Zl (257) 
This yields the transition: 

E |*< + í>> ^ i £ e™^ |* a /r) (258) 

and so shifts the offset into a phase factor and inverts the period as a múltiple of s/r. A 
measurement of the input register in the computational basis yields c = ks/r. The al- 
gorithm is run a number of times until a value of k coprime to r is obtained. Cancelling 
c/s to lowest terms then yields k and r as k/r. 

Since we don't know the value of r in advance of applying the algorithm, we do 
not, of course, recognize when a measurement outcome yields a value of k coprime to 
r. The idea is to run the algorithm, cancel c/s to lowest terms to obtain a candidate 
value for r and hence a candidate factor of N, which can then be tested by division 
into TV. Even when we do obtain a value of k coprime to r, some vàlues of a will 
yield a period for which the method fails to yield a factor of N, in which case we 
randomly choose a new value of a and run the algorithm with this value. The point is 
that all these steps are efficient, i.e., can be performed in polynomial time, and since 
only a polynomial number of repetitions are required to determine a factor with any 
given probability p < 1, the algorithm is a polynomial-time algorithm, achieving an 
exponential speed-up over any known classical algorithm. 

To see how the algorithm functions geometrically, consider the case N = 15, a = 7 
and s — 64 discussed in |Barenco, 1998, p. 160]. In this case, the function f(x) = 
a x mod 15 is: 



7° 


mod 15 


= 1 


7 1 


mod 15 


= 7 


? 2 


mod 15 


= 4 


? 3 


mod 15 


= 13 


7 4 


mod 15 


= 1 



and the period is evidently r — 4. 18 After the application of the unitary transformation 
Uf — a x mod N, the state of the two registers is: 

i(|0)|l) + |l)|7) + |2)|4) + |3)|13> 
+|4)|1) + \m + |6)|4) + |7>|13> 

I8 The factors 3 and 5 of 15 are derived as the greatest common factors of a r / 2 — 1 = 48 and 15 and 
a r / 2 + 1 = 50 and 15, respectively. 
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+ |60)|1) + |61)|7) + |62)|4) + |63)|13)) (259) 

This state can be expressed as: 

!(|0) + |4) + |8) + ... + |60»|l) 
+±(|l) + |5) + |9) + ... + |61))|7) 
+ ±(|2) + |6) + |10) + ... + |62))|4) 

+l(l 3 > + l 7 > + l 11 ) + ·-· + l 63 »l 13 )) ( 260 ) 

If we measure the output register, we obtain (equiprobably) one of four states for 
the input register, depending on the outcome of the measurement: 1, 7, 4, or 13: 

i(|0) + |4> + |8> + ... + |60» (261) 

±(|l) + |5) + |9) + ... + |61)) (262) 

i(|2) + |6) + |10} + ... + |62)) (263) 

|(|3) + |7) + |ll) + ... + |63)) (264) 

These are the states J256Í for vàlues of the offset 0, 1, 2, 3. Application of the discrete 
Fourier transform yields: 

xi = : i(|0) + |16) + [32) + [48» 

x 7 = l: |(|0)+i|16) - |32> — ij48>) 

Xi = 2 : |(|0) - |16) + |32) - |48» 

x 13 = 3: |(|0) — *|16> — |32)+i|48)) 

which are the states in (12581 . So for the period r = 4, the state of the input register 
ends up in the 4-dimensional subspace spanned by the vectors |0), |16), |32), |48). 

Now consider all possible even periods r for which f(x) — a x mod 15, where 
a is coprime to 15. The other possible vàlues of a are 2, 4, 8, 11, 13, 14 and the 
corresponding periods turn out to be 4, 2, 4, 2, 4, 2. So we need only consider r = 2. 

19 

For r = 2, if we measure the output register, we will obtain (equiprobably) one of 
two states for the input register, depending on the outcome of the measurement (say, a 
orb): 

|0) + |2) + |4) + . . . + |62) (265) 
|l) + |3) + |5) + ... + |63) (266) 

After the discrete Fourier transform, these states are transformed to: 

x a = : |0) + |32) 

"Every value of a except a = 14 yields the correct factors for 15. For a = 14, the method fails: r = 2, 
so a2 = —1 mod 15. 
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x b = l: |0) - |32) 

In this case, the 2-dimensional subspace V r= 2 spanned by |0), 1 32) for r = 2 is in- 
cluded in the 4-dimensional subspace V r= 4 for r = 4. A measurement can distinguish 
r = 4 from r = 2 reliably, i.e., whether the final state of the input register is in V r= 4 
or V r= 2, only if the final state is in V r= 4 — V r= 2, the part of V r= 4 orthogonal to V r= 2- 
What happens if the final state ends up in V r= 2? 

Shor's algorithm works as a randomized algorithm. As mentioned above, it pro- 
duces a candidate value for the period r and hence a candidate factor of N, which can 
be tested (in polynomial time) by division into N. A measurement of the input regis- 
ter in the computational basis yields an outcome c — ks jr. The value of k is chosen 
equiprobably by the measurement of the output register. The procedure is to repeat the 
algorithm until the outcome yields a value of k coprime to r, in which case canceling 
c/s to lowest terms yields k and r as k/r. 

For example, suppose we choose a = 7, in which case (unknown to us) r = 4. The 
vàlues of k coprime to r are k = 1 and k = 3 (this is also unknown to us, because 
k depends on the value of r). Then c/s cancelled to lowest terms is 1/4 and 3/4, 
respectively, both of which yield the correct period. From the geometrical perspective, 
these vàlues of k correspond to finding the state after measurement in the computational 
basis to be 1 16) or |48), both of which do distinguish V r= 4 from V r= 2- 

Suppose we choose a value of a with period r = 2 and find the value c = 32. The 
only value of k coprime to r is k — 1. Then c/s cancelled to lowest terms is 1/2, which 
yields the correct period, and hence the correct factors of N. But c = 32 could also be 
obtained for a — 7, r — 4, and k = 2, which does not yield the correct period, and 
hence does not yield the correct factors of N. Putting it geometrically: the value k = 1 
for r = 2 corresponds to the same state, |32), as the value k — 2 for r = 4. Once we 
obtain the candidate period r = 2 (by cancelling c/s — 32/64 to lowest terms), we 
calculate the factors of N as the greatest common factors of a ± 1 and N and test these 
by division into N. If a = 7, these calculated factors will be incorrect. If a = 2, say, 
the factors calculated in this way will be correct. 

We see that, with the added information provided by the outcome of a test division 
of a candidate factor into N, Shor's randomized algorithm again amounts to deter- 
mining which disjunction among alternative disjunctions is true, i.e., which subspace 
contains the state, without determining the truth vàlues of the disjuncts. 

6.3 Where Does the Speed-Up Come From? 

What, precisely, is the feature of a quantum computer responsible for the phenomenal 
efficiency over a classical computer? In the case of Simon's algorithm, the speed-up is 
exponential over any classical algorithm; in the case of Shor's algorithm, the speed-up 
is exponential over any known classical algorithm. 
Steane 1 1998] remarks: 

The period finding algorithm appears at first sight like a conjuring 
trick: it is not quite clear how the quantum computer managed to produce 
the period like a rabbit out of a hat. ... I would say that the most important 
features are contained in = i X)x=o \ x ) l/( x ))]- They are not only the 
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quantum parallelism already mentioned, but also quantum entanglement, 
and, finally, quantum interference. Each value of f(x) retains a link with 
the value of x which produced it, through entanglement of the x and y reg- 
isters in The 'màgic' happens when a measurement of the y register 
produces the special state [-^ T^/Zq 1 \ x í + J r )] m tne a>register, and 
it is quantum entanglement which permits this (see also | Jozsa, 1997a|). 
The final Fourier transform can be regarded as an interference between the 
various superposed states in the x-register (compare with the action of a 
diffraction grating). 

Interference effects can be used for computational purposes with clas- 
sical light fields, or wàter waves for that matter, so interference is not in 
itself the essentially quantum feature. Rather, the exponentially large num- 
ber of interfering states, and the entanglement, are features which do not 
arise in classical systems. 

Jozsa points out 1 1997a| that the state space (phase space) of a composite classical 
system is the Cartesian product of the state spaces of its subsystems, while the state 
space of a composite quantum system is the tensor product of the state spaces of its 
subsystems. For n qubits, the quantum state space has 2" dimensions. So the infor- 
mation required to represent a general state increases exponentially with n: even if we 
restrict the specification of the amplitudes to numbers of finite precision, a superposi- 
tion will in general have 0(2 n ) components. For a classical composite system of n 
two-level subsystems, the number of possible states grows exponentially with n, but 
the information required to represent a general state is just n times the information re- 
quired to represent a single two-level system, i.e., the information grows only linearly 
with n because the state of a composite system is just a product state. 

More formally, Jozsa and Linden 1 2002 1 have shown that a quantum algorithm op- 
erating on pure states can achieve an exponential speed-up over classical algorithms 
only if the quantum algorithm involves multi-partite entanglement that increases un- 
boundedly with the input size. Similarly, Vidal 1 2003 1 has shown that a classical com- 
puter can simulate the evolution of a pure state of n qubits with computational resources 
that grow linearly with n and exponentially in multi-partite entanglement. 

The essential feature of the quantum computations discussed above in i|6.2l is the 
selection of a disjunction, representing a global property of a function, among alterna- 
tive possible disjunctions without computing the truth vàlues of the disjuncts, which is 
redundant information in a quantum computation but essential information classically. 
Note that a quantum disjunction is represented by a subspace of entangled states in 
the tensor product Hilbert space of the input and output registers. This is analogous to 
the procedure involved in the key observation underlying the proof of the quantum bit 
commitment theorem discussed in £15.2.21 The series of operations described by equa- 
tions J167I — dT72t . in which the channel particle is entangled with ancilla systems and 
the ancillas are subsequently measured, effectively constitute a quantum computaton. 

The first stage of a quantum algorithm involves the creation of a state in which ev- 
ery input value to the function is correlated with a corresponding output value. This is 
referred to as 'quantum parallelism' and is sometimes cited as the source of the speed- 
up in a quantum computation. The idea is that a quantum computation is something 
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like a massively parallel classical computation, for all possible vàlues of a function. 
This appears to be Deutsch's view, with the parallel computations taking place in par- 
allel universes. For a critique, see ISteane, 2003 1, who defends a view similar to that 
presented here. Of course, all these different vàlues are inaccessible: a measurement 
in the computational basis will only yield (randomly) one correlated input-output pair. 
Further processing is required, including the final discrete Fourier transform for the 
three algorithms discussed in %.2\ It would be incorrect to attribute the efficiency 
of these quantum algorithms to the interference in the input register produced by the 
Fourier transform. The role of the Fourier transform is simply to allow a measurement 
in the computational basis to reveal which subspace representing the target disjunction 
contains the state. 

One might wonder, then, why the discrete Fourier transform is even necessary. We 
could, of course, simply perform an equivalent measurement in a different basis. But 
note that a computation would have to be performed to determine this basis. This 
raises the question of precisely how to assess the speed-up of a quantum algorithm 
relative to a rival classical algorithm. What are the relevant computational steps to be 
counted in making this assessment for a quantum computation? Since any sequence of 
unitary transformations is equivalent to a single unitary transformation, and a unitary 
transformation followed by a measurement in a certain basis is equivalent to simply 
performing a measurement in a different basis, any quantum computation can always 
be reduced to just one step: a measurement in a particular basis! 

Of course, this observation is hardly illuminating, since a computation at least as 
difficult as the original computation would have to be performed to determine the re- 
quired basis, but it does indicate that some convention is required about what steps to 
count in a quantum computation. The accepted convention is to require the unitary 
transformations in a quantum computation to be constructed from elementary quantum 
gates that form a universal set (e.g., the CNOT gate, the Hadamard gate, the phase gate, 
and the tt/8 gate discussed in and to count each such gate as one step. In addition, 
all measurements are required to be performed in the computational basis, and these are 
counted as additional steps. The final discrete Fourier transforms in the Deutsch-Jozsa 
algorithm, Simon's algorithm, and Shor's algorithm are indispensable in transforming 
the state so that the algorithms can be completed by measurements in the computational 
basis, and it is an important feature of these algorithms that the Fourier transform can 
be implemented efficiently with elementary unitary gates. To claim that a quantum 
algorithm is exponentially faster than a classical algorithm is to claim that the number 
of steps counted in this way for the quantum algorithm is a polynomial function of the 
size of the input (the number of qubits required to store the input), while the classical 
algorithm involves a number of steps that increases exponentially with the size of the 
input (the number of bits required to store the input). 
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7 Quantum Foundations from the Perspective of Quan- 
tum Information 



Does the extension of the classical theory of information to quantum states shed new 
light on the foundational problems of quantum mechanics underlying the Bohr-Einstein 
debaté mentioned in §1, in particular the measurement problem? Researchers in the 
area of quantum information and quantum computation often suggest a positive answer 
to this question, with a promissory note for how the story is supposed to go. More fully 
worked-out (generally, rather different) positive responses have been proposed by var- 
ious authors, notably Fuchs [2001b 2002a 2002b 2001a | and Brukner and Zeilinger 
1200 II 1200021 . For a very thorough analysis and critique of the Brukner-Zeilinger po- 
sition, see | Tim pson, 2004| . See also Hall 1 2000 1 and the response by Brukner and 
Zeilinger 120001 . Here I shall limit my discussion to the significance of a characteriza- 
tion theorem for quantum mechanics in terms of information-theoretic constraints by 
Clifton, Bub, and Halvorson (CBH) EÜ031 . 

7.1 The CBH Characterization Theorem 

CBH showed that one can derive the bàsic kinematic features of a quantum description 
of physical systems from three fundamental information-theoretic constraints: 

• the impossibility of superluminal information transfer between two physical sys- 
tems by performing measurements on one of them, 

• the impossibility of perfectly broadcasting the information contained in an un- 
known physical state (which, for pure states, amounts to 'no cloning'), 

• the impossibility of communicating information so as to implement a bit com- 
mitment protocol with unconditional security (so that cheating is in principle 
excluded by the theory). 

More precisely, CBH formulate these information-theoretic constraints in the gen- 
eral framework of C* -algebras, which allows a mathematically abstract characteriza- 
tion of a physical theory that includes, as special cases, all classical mechanical theo- 
ries of both wave and particle varieties, and all variations on quantum theory, including 
quantum field theories (plus any hybrids of these theories, such as theories with supers- 
election rules). Within this framework, CBH show that the three information-theoretic 
constraints jointly entail three physical conditions that they take as definitive of what it 
means to be a quantum theory in the most general sense. Specifically, the information- 
theoretic constraints entail that: 

• the algebras of observables pertaining to distinct physical systems commute (a 
condition usually called microcausality or, to use Summers' term | Summers, 1990 1, 
kinematic independence), 

• any individual system's àlgebra of observables is noncommutative, 

• the physical world is nonlocal, in that spacelike separated systems can occupy 
entangled states that persist as the systems separate. 
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CBH also partly demonstrated the converse derivation, leaving open a question 
concerning nonlocality and bit commitment. This remaining issue was later resolved by 
Hans Halvorson 1 2004 1, so the CBH theorem is a characterization theorem for quantum 
theory in terms of the three information-theoretic constraints. 

Note that the C*-algebraic framework is not restricted to the Standard quantum 
mechanics of a system represented on a single Hilbert space with a unitary dynamics, 
but is general enough to cover cases of systems with an infinite number of degrees of 
freedom that arise in quantum field theory and the thermodynamic limit of quantum sta- 
tistical mechanics (in which the number of microsystems and the volume they occupy 
goes to infinity, while the density defined by their ratio remains constant). The C*- 
algebraic framework has even been applied to quantum field theory on curved space- 
time and so is applicable to the quantum theoretical description of exòtic phenomena 
such as Hawking radiation (black hole evaporation); see |Wald, 1984|. The Stone-von 
Neumann theorem, which guarantees the existence of a unique representation (up to 
unitary equivalence) of the canonical commutation relations for systems with a finite 
number of degrees of freedom, breaks down for such cases, and there will be many 
unitarily inequivalent representations of the canonical commutation relations. 

One could, of course, consider weaker mathematical structures, but it seems that 
the C*-algebraic machinery suffices for all physical theories that have been found to 
be empirically successful to date, including phase space theories and Hilbert space the- 
ories |Landsman, 1998 1, and theories based on a manifold IConnes, 1994 1. For further 
discussion of this point, see Halvorson and Bub 1 2005 1 . See also Halvorson (this vol., 
chap. 8), Emch (this vol., ch. 8), and Landsman (this vol., ch. 5). 

A C* -àlgebra is essentially an abstract generalization of the structure of the àlgebra 
of operators on a Hilbert space. Technically, a (unital) C* -àlgebra is a Banach * -àlgebra 
over the complex numbers containing the identity, where the involution operation * and 
the norm are related by = ||A|| 2 . So the àlgebra 05 (H) of all bounded operators 

on a Hilbert space Tí is a C* -àlgebra, with * the adjoint operation and || • || the Standard 
operator norm. 

In Standard quantum theory, as discussed in H3.1.1I a state on 05 (Ti) is defined by a 
density operator p on Ti in terms of an expectation-valued functional p(A) — Ti(pA) 
for all observables represented by self-adjoint operators A in 05 (Tí). This definition 
of p(A) in terms of p yields a positive normalized linear functional. So a state on a 
C* -àlgebra £ is defined, qui te generally, as any positive normalized linear functional 
p : £ — > C on the àlgebra. Pure states can be defined by the condition that if p = 
\p~i + (1 — X)p2 with A e (0,1), then p = p\ = p2\ other states are mixed. In 
the following, we drop the '"' in p, but note that a C*-algebraic state p is a positive 
linear functional on £, while the density operator of Standard quantum mechanics is an 
element of £ = Q5(7Í). 

By Gleason's theorem IGleason, 1957 1, every C*-algebraic state in this sense on 
a C*-algebra £ = 05(7í) is given by a density operator on 05(7í). However, because 
countable additivity is not presupposed by the C*-algebraic notion of state (and, there- 
fore, Gleason's theorem does not apply in general), there can be pure states of 05 (Ti) 
that are not representable by vectors in Ti. In fact, if A is any self-adjoint element of a 
C*-algebra 21, and a € sp(A), then there always exists a pure state p of 21 that assigns 
a dispersion-free value of a to A [ Kadis on and Ringrose, 1997] Ex. 4.6.31]. Since this 
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is true even when we consider a point in the continuous spectrum of a self-adjoint oper- 
ator A acting on a Hilbert space, without any corresponding eigenvector, it follows that 
there are pure states of 23 (7Y) in the C*-algebraic sense that cannot be vector states 
(nor, in fact, representable by any density operator Ti). 

As we saw in ^ 13. 1 .31 the general evolution of a quantum system resulting from a 
combination of unitary interactions and selective or nonselective measurements can be 
described by a quantum operation, i.e., a completely positive linear map. Accordingly, 
a completely positive linear map T : € — > £, where < T(J) < I is taken as de- 
scribing the general evolution of a system represented by a C*-algebra of observables. 
The map or operation T is called selective if T(I) < I and nonselective if T(I) = I. 
Recali that a yes-no measurement of some idempotent observable represented by a 
projection operator P is an example of a selective operation. Here, T(A) = PAP 
for all A in the C* -àlgebra £, and p T , the transformed ('collapsed') state, is the final 
state obtained after measuring P in the state p and ignoring all elements of the en- 
semble that do not yield the eigenvalue 1 of P (so p T {A) = p(T(A))/ p(T(I)) when 
p(T(I)) 7^ 0, and p T = otherwise). The time evolution in the Heisenberg pic- 
ture induced by a unitary operator U G £ is an example of a nonselective operation. 
Here, T(A) = UAU~ l . Similarly, the measurement of an observable O with spectral 
measure {Pi}, without selecting a particular outcome, is an example of a nonselective 
operation, with T{A) = Y^L·i Pí^Pí- As in the Standard quantum theory of a system 
with a finite-dimensional Hilbert space (cf. Eq. \61\ of ÍI3.1.3L any completely positive 
linear map can be regarded as the restriction to a local system of a unitary map on a 
larger system. 

A representation of a C* -àlgebra £ is any mapping n : € — » 03 {Ti) that preserves 
the linear, product, and * structure of C The representation is faithful if ir is one-to-one, 
in which case 7r((£) is an isomorphic copy of €. The Gelfand-Naimark theorem says 
that every abstract C* -àlgebra has a concrete faithful representation as a norm-closed 
*-subalgebra of 23(7í), for some appropriate Hilbert space Ti. As indicated above, in 
the case of systems with an infinite number of degrees of freedom (e.g., quantum field 
theory), there are inequivalent representations of the C* -àlgebra of observables defined 
by the commutation relations. 

Every classical phase space theory defines a commutative C* -àlgebra. For exam- 
ple, the observables of a classical system of n partides — the real-valued continuous 
functions on the phase space IR 6ïl — can be represented as the self-adjoint elements of 
the C* -àlgebra 23(R 6íi ) of all continuous complex-valued functions / on R 6n . The 
phase space R 6n is locally compact and can be made compactby adding just one point 
'at infinity,' or we can simply consider a bounded (and thus compact) subset of M 6 ™. 
The statistical states of the system are given by probability measures p on K 6n , and 
pure states, corresponding to maximally complete information about the partides, are 
given by the individual points of M. 6n . The system's state p in the C*-algebraic sense 
is the expectation functional corresponding to p, defined by p(f) — J R6 „ fdp. Con- 
versely | Kadiso n and Ringrose, 1997] Thm. 4.4.3], every commutative C*-algebra C 
is isomorphic to the set C(X) of all continuous complex-valued functions on a locally 
compact Hausdorff space X defined by the pure states of €. If £ has a multiplicative 
identity, the 'phase space' X is compact. In this 'function representation' of C, the 
isomorphism maps an element C € € to the function C (the Gelfand transformation 
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of C) whose value at any p is just the (dispersion-free) value that p assigns to C. So 
'behind' every abstract commutative C* -àlgebra there is a classical phase space theory 
defined by its function representation on the phase space X. This representation theo- 
rem (and its converse) justifies characterizing a C*-algebraic theory as classical just in 
case its àlgebra is commutative. 

As we saw above, CBH identify quantum theories with a certain subclass of non- 
commutative C* -algebras, where the condition of kinematic independence is satisfied 
by the algebras of observables of distinct systems and the states of spacelike separated 
systems are characterized by the sort of nonlocality associated with entanglement. 

To clarify the rationale for this characterization and the significance of the information- 
theoretic constraints, consider a composite quantum system AB, consisting of two sub- 
systems, A and B. For simplicity, assume the systems are indistinguishable, so their 
C* -algebras 21 and 23 are isomorphic. The observables of the component systems A 
and B are represented by the self-adjoint elements of 21 and 23, respectively. Let 21 V 23 
denote the C* -àlgebra generated by 21 and 23. The physical states of A, B, and AB, 
are given by positive normalized linear functionals on their respective algebras that en- 
code the expectation vàlues of all observables. To capture the idea that A and B are 
physically distinct systems, CBH make the assumption that any state of 21 is compat- 
ible with any state of 23, i.e., for any state pa of 21 and pb of 23, there is a state p of 
21 V 25 such that p\<^ = p^ and p\m = p B . 

The sense of the 'no superluminal information transfer via measurement' constraint 
is that when Alice and Bob, say, perform local measurements, Alice's measurements 
can have no infiuence on the statistics for the outcomes of Bob's measurements, and 
conversely. That is, merely performing a local measurement cannot, in and of itself, 
convey any information to a physically distinct system, so that everything iooks the 
same' to that system after the measurement operation as before, in terms of the expec- 
tation vàlues for the outcomes of measurements. CBH show |2003 Thm. 1] that it 
follows from this constraint that A and B are kinematically independent systems if they 
are physically distinct in the above sense, i.e., every element of 21 commutes pairwise 
with every element of 25. (More precisely, an operation T on 21 V 25 convey s no in- 
formation to Bob just in case {T*p)\çg — p\çg for all states p of 25, where T* is the 
map on the states, i.e., the positive linear functionals on 21 V 25, induced by T. Clearly, 
the kinematic independence of 21 and 25 entails that Alice's local measurement opera- 
tions cannot convey any information to Bob, i.e., T(B) = X)"=i E^ 2 BE^ 2 = B for 
Be 25 if T is implemented by a POVM in 21. CBH prové that if Alice cannot convey 
any information to Bob by performing local measurement operations, then 21 and 25 
are kinematically independent.) 

The 'no broadcasting' condition now ensures that the individual algebras 21 and 25 
are noncommutative. Recali that for pure states, broadcasting reduces to cloning, and 
that in elementary quantum mechanics, neither cloning nor broadcasting is possible in 
general (see section EOl . CBH show that broadcasting and cloning are always possible 
for classical systems, i.e., in the commutative case there is a universal broadcasting 
map that clones any pair of input pure states and broadcasts any pair of input mixed 
states |Clift on et al., 2003] Thm. 2]. Conversely, they show that if any two states can 
be (perfectly) broadcast, then any two pure states can be cloned; and if two pure states 
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of a C*-algebra can be cloned, then they must be orthogonal. So, if any two states can 
be broadcast, then all pure states are orthogonal, from which it follows that the àlgebra 
is commutative. 

The quantum mechanical phenomenon of interference is the physical manifestation 
of the noncommutativity of quantum observables or, equivalently, the superposition of 
quantum states. So the impossibility of perfectly broadcasting the information con- 
tained in an unknown physical state, or of cloning or copying the information in an 
unknown pure state, is the information-theoretic counterpart of interference. 

Now, if 21 and 05 are noncommutative and mutually commuting, it can be shown 
that there are nonlocal entangled states on the C*-algebra 21 V ÍB they generate (see 
|Landau, 1987||SÜmmers, 1990||Ba cciagaluppi, 1994-1, and — more relevantly here, in 
terms of a specification of the range of entangled states that can be guaranteed to exist — 
|Halvorson, 2004| ). So it seems that entanglement — what Schròdinger H1935I p. 555] 
identified as 'the characteristic trait of quantum mechanics, the one that enforces its 
entire departure from classical lines of thought,' as we saw in 34.11 — follows autòmat - 
ically in any theory with a noncommutative àlgebra of observables. That is, it seems 
that once we assume 'no superluminal information transfer via measurement,' and 'no 
broadcasting,' the class of allowable physical theories is restricted to those theories in 
which physical systems manifest both interference and nonlocal entanglement. But in 
terms of physical interpretation this conclusion is a bit too quick, since the derivation 
of entangled states depends on formal properties of the C*-algebraic machinery. More- 
over, we have no assurance that two systems in an entangled state will maintain their 
entanglement indefinitely as they separate in space, which is the case for quantum en- 
tanglement. But this is precisely what is required by the cheating strategy that thwarts 
secure bit commitment, since Alice will have to keep one system of such a pair and 
send the other system to Bob, whose degree of spatial separation from Alice is irrel- 
evant, in principle, to the implementation of the protocol. In an information-theoretic 
characterization of quantum theory, the fact that entangled states of composite systems 
can be instantiated, and instantiated nonlocally so that the entanglement of composite 
system is maintained as the subsystems separate in space, should be shown to follow 
from some information-theoretic principle. The role of the 'no bit commitment' con- 
straint is to guarantee the persistence of entanglement over distance, i.e., the existence 
of a certain class of nonlocal entangled states — hence it gives us nonlocality, not merely 
'holism.' 

As shown in 35.21 unconditionally secure quantum bit commitment is impossible 
because a generalized version of the EPR cheating strategy can always be applied by 
introducing additional ancilla partides and enlarging the Hilbert space in a suitable 
way. That is, for a quantum mechanical system consisting of two (separated) subsys- 
tems represented by the C*-algebra Q5(7Yi) <E> 03(7^2), any mixture of states on *B(7Í2) 
can be generated from a distance by performing an appropriate generalized measure- 
ment on the system represented by íB(7ii), f° r an appropriate entangled state of the 
composite system. This is what Schròdinger called 'remote steering' and found so 
physically counterintuitive that he speculated 1 1936 p. 451] (wrongly, as it turned 
out) that experimental evidence would eventually show that this was simply an artefact 
of the theory, not instantiated in our world. He suggested that an entangled state of 
a composite system would almost instantaneously decay to a mixture as the compo- 
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nent systems separated. There would still be correlations between the states of the 
component systems, but remote steering would no longer be possible. 

It seems worth noticing that the [EPR] paradox could be avoided by a very 
simple assumption, namely if the situation after separating were described 
by [the entangled state ^(x,y) = Ylk a k9k(%) fk(y)]> but with the ad- 
ditional statement that the knowledge of the phase relations between the 
complex constants has been entirely lost in consequence of the process 
of separation. This would mean that not only the parts, but the whole sys- 
tem, would be in the situation of a mixture, not of a pure state. It would not 
preclude the possibility of determining the state of the first system by suit- 
able measurements in the second one or vice versa. But it would utterly 
eliminate the experimenters influence on the state of that system which he 
does not touch. 

Schròdinger regarded the phenomenon of interference associated with noncom- 
mutativity in quantum mechanics as unproblematic, because he saw this as reflect- 
ing the fact that partides are wavelike. But he did not believe that we live in a 
world in which physical systems can exist nonlocally in entangled states, because such 
states would allow Alice to steer Bob's system into any mixture of pure states com- 
patible with Bob's reduced density operator and he did not expect that experiments 
would bear this out. Of course, it was an experimental question in 1935 whether 
Schròdinger's conjecture was correct or not. We now know that the conjecture is 
false. A wealth of experimental evidence, including the confirmed violations of Bell's 
inequality |Aspect e t al., 198T]|Aspect et al, 1 982| and the confirmations of quantum 
teleportation | B ouwmeester et al, 1997||Boschief k, Í998||Furasawa et al, 1998||Nielsen et al, 1998| , 
testify to this. The relevance of Schròdinger's conjecture here is this: it raises the pos- 
sibility of a quantum-like world in which there is interference but no nonlocal entan- 
glement. Can we exclude this possibility on information-theoretic grounds? 

Now although unconditionally secure bit commitment is no less impossible for 
classical systems, in which the algebras of observables are commutative, than for quan- 
tum systems, the insecurity of a bit commitment protocol in a noncommutative setting 
depends on considerations entirely different from those in a classical commutative set- 
ting. As we saw in ^15.21 the security of a classical bit commitment protocol is a matter 
of computational complexity and cannot be unconditional. 

By contrast, if, as Schròdinger speculated, we lived in a world in which the algebras 
of observables are noncommutative but composite physical systems cannot exist in 
nonlocal entangled states, if Alice sends Bob one of two mixtures associated with the 
same density operator to establish her commitment, then she is, in effect, sending Bob 
evidence for the truth of an exclusive disjunction that is not based on the selection 
of a particular disjunct. (Bob's reduced density operator is associated ambiguously 
with both mixtures, and hence with the truth of the exclusive disjunction: '0 or 1'.) 
Noncommutativity allows the possibility of different mixtures associated with the same 
density operator. What thwarts the possibility of using the ambiguity of mixtures in this 
way to implement an unconditionally secure bit commitment protocol is the existence 

20 A similar possibility was raised and rejected by Furry 1 1936 . 
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of nonlocal entangled states between Alice and Bob. This allows Alice to cheat by 
preparing a suitable entangled state instead of one of the mixtures, where the reduced 
density operator for Bob is the same as that of the mixture. Alice is then able to steer 
Bob's systems remotely into either of the two mixtures associated with the alternative 
commitments at will. 

So what would allow unconditionally secure bit commitment in a noncommuta- 
tive theory is the absence of physically occupied nonlocal entangled states, or the 
spontaneous destruction of entanglement as systems separate. One can therefore take 
Schròdinger's remarks as relevant to the question of whether or not secure bit commit- 
ment is possible in our world. In effect, Schròdinger raised the possibility that we live 
in a quantum-like world in which unconditionally secure bit commitment is possible! 
It follows that the impossibility of unconditionally secure bit commitment entails that, 
for any mixed state that Alice and Bob can prepare by following some (bit commit- 
ment) protocol, there is a corresponding nonlocal entangled state that can be physically 
occupied by Alice's and Bob's partides and persists indefinitely as the partides move 
apart. 

To sum up: the content of the CBH theorem is that a quantum theory — a C*- 
algebraic theory whose observables and states satisfy condi tions of kinematic indepen- 
dence, noncommutativity, and nonlocality — can be characterized by three information- 
theoretic constraints: no superluminal communication of information via measure- 
ment, no (perfect) broadcasting, and no (unconditionally secure) bit commitment. 

7.2 Quantum Mechanics as a Theory of Information 

Consider Einstein's view, 21 mentioned in §1, that quantum mechanics is incomplete. 
Essentially, Einstein based his argument for this claim on the demand that a complete 
physical theory should satisfy certain principies of realism (essentially, a locality prin- 
ciple and a separability principle), which amounts to the demand that statistical corre- 
lations between spatially separated systems should have a common causal explanation 
in terms of causal factors obtaining at the common origin of the systems. Roughly 
thirty years after the publication of the Einstein-Podolsky-Rosen paper |1935|, John 
Bell [ 1964| showed that the statistical correlations of the entangled Einstein-Podolsky- 
Rosen state for spatially separated partides are inconsistent with any explanation in 
terms of a classical probability distribution over common causal factors originating at 
the source of the partides before they separate. But the fact that quantum mechan- 
ics allows the possibility of correlations that are not reducible to common causes is 
a virtue of the theory. It is precisely the nonclassical correlations of entangled states 
that underlie the possibility of an exponential speed-up of quantum computation over 
classical computation, the possibility of unconditionally secure key distribution but the 
impossibility of unconditionally secure quantum bit commitment, and phenomena such 
as quantum teleportation and other nonclassical entanglement-assisted communication 
protocols. 

While Einstein's argument for incompleteness fails, there is another sense, also 

21 The following discussion is adapted from | Bub, 2004 and | Bub, 2005 , but the argument here is devel- 
oped somewhat differently. 
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associated with entangled states, in which quantum mechanics might be said to be in- 
complete. In a typical (idealized) quantum mechanical measurement interaction, say 
an interaction in which the two possible vàlues, and 1, of an observable of a qubit in 
a certain quantum state become correlated with the two possible positions of a macro- 
scopic pointer observable, po and p±, the final state is an entangled state, a linear su- 
perposition of the states |0)|po) ar, d with coefficients derived from the initial 

quantum state of the qubit. To dramatize the problem, Schròdinger 1 1936 1 considered 
the case where \po) and \pi) represent the states of a cat being alive and a cat being 
dead in a closed box, which is only opened by the observer some time after the mea- 
surement interaction. On the Standard way of relating the quantum state of a system to 
what propositions about the system are determinately (definitely) true or false, and what 
propositions have no determinate truth value, some correlational proposition about the 
composite system (microsystem + cat) is true in this entangled state, but the proposi- 
tions asserting that the cat is alive (and the value of the qubit observable is 0), or that 
the cat is dead (and the value of the qubit observable is 1), are assigned no determinate 
truth value. Moreover, if we assume that the quantum propositions form an algebraic 
structure isomorphic to the structure of subspaces of the Hilbert space of the composite 
system — the representational space for quantum states and observables — then it is easy 
to derive a formal contradiction from the assumption that the correlational proposition 
corresponding to the entangled state is true, and that the cat is either definitely alive 
or definitely dead. 22 Schròdinger thought that it was absurd to suppose that quantum 
mechanics requires us to say that the cat in such a situation (a macrosystem) is nei- 
ther alive nor dead (does not have a determinate macroproperty of this sort) until an 
observer opens the box and looks, in which case the entangled state 'col·lapses' non- 
linearly and stochastically, with probabilities given by the initial quantum state of the 
microsystem, onto a product of terms representing a definite state of the cat and a def- 
inite state of the microsystem. Einstein 119671 p. 39] concurred and remarked in a 
letter to Schròdinger: Tf that were so then physics could only claim the interest of 
shopkeepers and engineers; the whole thing would be a wretched bungle.' 

This is the Standard 'measurement problem' of quantum mechanics. Admittedly, 
the formulation of the problem is highly idealized, but the fundamental problem arises 
from the way in which quantum mechanics represents correlations via entangled states 
and does not disappear entirely in less idealized formulations (even though the prob- 
lem is somewhat altered by considering the macroscopic nature of the instrument, and 
the ròle of the environment). (See Dickson, this vol., ch. 4, and [Bub, 1997 1 for fur- 
ther discussion.) I shall refer to this problem — the Schròdinger incompleteness of the 
theory — as Schròdinger's problem. It is a problem about truth (or the instantiation of 
properties), as opposed to a distinct problem about probabilities. 

Before formulating the probability problem, consider what was involved in the tran- 
sition from classical to quantum mechanics. Quantum mechanics first appeared as 
Heisenberg's matrix mechanics in 1925, following the 'old quantum theory,' a patch- 
work of ad hoc modifications of classical mechanics to accommodate Planck's quan- 
tum postulate. Essentially, Heisenberg modified the kinematics of classical mechanics 

22 This also follows from the Bub-Clifton theorem discussed below. The sublattice of determinate quan- 
tum propositions defined by the identity and the EPR state is maximal: adding any proposition involves a 
contradiction. 
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by replacing certain classical dynamical variables, like position and momentum, with 
mathematical representatives — matrices — which do not commute. Shortly afterwards, 
Schròdinger developed a wave mechanical version of quantum mechanics and proved 
the formal equivalence of the two theories. It is common to understand the significance 
of the transition from classical to quantum mechanics in terms of 'wave-particle dual- 
ity,' the idea that a quantum system like an electron, unlike a classical system like a 
stone, manifests itself as a wave under certain circumstances and as a particle under 
other circumstances. This picture obscures far more than it illuminates. We can see 
more clearly what is going on conceptually if we consider the implications of Heisen- 
berg's move for the way we think about objects and their properties in the most general 
sense. 

Heisenberg replaced the commutative àlgebra of dynamical variables of classical 
mechanics — position, momentum, angular momentum, energy, etc. — with a noncom- 
mutative àlgebra. Some of these dynamical variables take the vàlues and 1 only and 
correspond to properties. For example, we can represent the property of a particle be- 
ing in a certain region of space by a dynamical variable that takes the value 1 when 
the particle is in the region and otherwise. A dynamical variable like position cor- 
responds to a set of such 2-valued dynamical variables or physical properties. In the 
case of the position of a particle, these are the properties associated with the particle 
being in region R, for all regions R. If, for all regions R, you know whether or not the 
particle is in that region, you know the position of the particle, and conversely. The 2- 
valued dynamical variables or properties of a classical system form a Boolean àlgebra, 
a subalgebra of the commutative àlgebra of dynamical variables. 

Replacing the commutative àlgebra of dynamical variables with a noncommutative 
àlgebra is equivalent to replacing the Boolean àlgebra of 2-valued dynamical variables 
or properties with a non-Boolean àlgebra. The really essential thing about the classical 
mode of representation of physical systems in relation to quantum mechanics is that 
the properties of classical systems are represented as having the structure of a Boolean 
àlgebra or Boolean lattice. Every Boolean lattice is isomorphic to a lattice of subsets 
of a set. 23 To say that the properties of a classical system form a Boolean lattice is 
to say that they can be represented as the subsets of a set, the phase space or state 
space of classical mechanics. To say that a physical system has a certain property is 
to associate the system with a certain set in a representation space where the elements 
of the space — the points of the set — represent all possible states of the system. A state 
picks out a collection of sets, the sets to which the point representing the state belongs, 
as the properties of the system in that state. The dynamics of classical mechanics is 
described in terms of a law of motion describing how the state moves in the state space. 
As the state changes with time, the set of properties selected by the state changes. (For 

23 A lattice is a partially ordered set in which every pair of elements has a greatest lower bound (or infimum) 
and least upper bound (or supremum) with respect to the ordering, a minimum element (denoted by 0), and 
a maximum element (denoted by 1). A Boolean lattice is a complemented, distributive lattice, i.e., every 
element has a complement (the lattice analogue of set-theoretic complementation) and the distributive law 
holds for the infimum and supremum. The partial ordering in a Boolean lattice represented by the subsets 
of a set X corresponds to the partial ordering defined by set inclusion, so the infimum corresponds to set 
intersection, the supremum corresponds to set union, corresponds to the null set, and 1 corresponds to the 
set x. A Boolean àlgebra, defined in terms of algebraic sum (+) and product (.) operations, is equivalent to 
a Boolean lattice defined as a partially ordered structure. 
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an elaboration, see [Hughes, 1995 1 and |Bu b, 1997| .) 

So the transition from classical to quantum mechanics involves replacing the rep- 
resentation of properties as a Boolean lattice, i.e., as the subsets of a set, with the 
representation of properties as a certain sort of non-Boolean lattice. Dirac and von 
Neumann developed Schròdinger's equivalence proof into a representation theory for 
the properties of quantum systems as subspaces in a linear vector space over the com- 
plex numbers: Hilbert space. The non-Boolean lattice in question is the lattice of sub- 
spaces of this space. Instead of representing properties as the subsets of a set, quantum 
mechanics represents properties as the subspaces of a linear space — as lines, or planes, 
or hyperplanes, i.e., as a projective geometry. Algebraically, this is the central struc- 
tural change in the transition from classical to quantum mechanics — although there is 
more to it: notably the fact that the state space for quantum systems is a Hilbert space 
over the complex numbers, not the reals, which is refiected in physical phenomena 
associated with the possibility of superposing states with different relative phases. 

Instead of talking about properties, we can talk equivalently about propositions. 
(We say that a given property is instantiated if and only if the corresponding proposi- 
tion is true.) In a Boolean propositional structure, there exist 2-valued homomorphisms 
on the structure that correspond to truth-value assignments to the propositions. In fact, 
each point in phase space — representing a classical state — defines a truth-value as- 
signment to the subsets representing the propositions: each subset to which the point 
belongs represents a true proposition or a property that is instantiated by the system, 
and each subset to which the point does not belong represents a false proposition or 
a property that is not instantiated by the system. So a classical state corresponds to a 
complete assignment of truth vàlues to the propositions, or a maximal consistent iist' 
of properties of the system, and all possible states correspond to all possible maximal 
consistent lists. 

Probabilities can be introduced on such a classical property structure as measures 
on the subsets representing the properties. Since each phase space point defines a 
truth-value assignment, the probability of a property is the measure of the set of truth- 
value assignments that assign a 1 ('true') to the property — in effect, we 'count' (in the 
measure-theoretic sense) the relative number of state descriptions in which the property 
is instantiated (or the corresponding proposition is true), and this number represents the 
probability of the property. So it makes sense to intèrpret the probability of a property 
as a measure of our ignorance as to whether or not the property is instantiated. Proba- 
bility distributions over classical states represented as phase space points are sometimes 
referred to as 'mixed states,' in which case states corresponding to phase space points 
are distinguished as 'pure states.' 

The problem for a quantum property structure, represented by the lattice of sub- 
spaces of a Hilbert space, arises because 2-valued homomorphisms do not exist on 
these structures (except in the special case of a 2-dimensional Hilbert space). If we take 
the subspace structure of Hilbert space seriously as the structural feature of quantum 
mechanics corresponding to the Boolean property structure or propositional structure 
of classical mechanics, the non-existence of 2-valued homomorphisms on the lattice of 
subspaces of a Hilbert space means that there is no partition of the totality of proper- 
ties of the assocated quantum system into two sets: the properties that are instantiated 
by the system, and the properties that are not instantiated by the system; i.e., there 
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is no partition of the totality of propositions into true propositions and false proposi- 
tions. (Of course, other ways of associating propositions with features of a Hilbert 
space are possible, and other ways of assigning truth vàlues, including multi-valued 
truth value assignments and contextual truth value assignments. Ultimately, the issue 
here concerns what we take as the salient structural change involved in the transition 
from classical to quantum mechanics.) 

It might appear that, on the Standard interpretation, a pure quantum state repre- 
sented by a 1-dimensional subspace in Hilbert space — a minimal element in the sub- 
space structure — defines a truth-value assignment on quantum propositions in an anal- 
ogous sense to the truth-value assignment on classical propositions defined by a pure 
classical state. Specifically, on the Standard interpretation, a pure quantum state selects 
the propositions represented subspaces containing the state as true, and the propositions 
represented by subspaces orthogonal to the state as false. (Note that orthogonality is the 
analogue of set-complement, or negation, in the subspace structure; the set-theoretical 
complement of a subspace is not in general a subspace.) 

There is, however, an important difference between the two situations. In the case 
of a classical state, every possible property represented by a phase space subset is se- 
lected as either instantiated by the system or not; equivalently, every proposition is 
either true or false. In the case of a quantum state, the properties represented by Hilbert 
space subspaces are not partitioned into two such mutually exclusive and collectively 
exhaustive sets: some propositions are assigned no truth value. Only propositions rep- 
resented by subspaces that contain the state are assigned the value 'true,' and only 
propositions represented by subspaces orthogonal to the state are assigned the value 
'false.' This means that propositions represented by subspaces that are at some non- 
zero or non-orthogonal angle to the ray representing the quantum state are not assigned 
any truth value in the state, and the corresponding properties must be regarded as inde- 
terminate or indefinite: according to the theory, there can be nofact ofthe matter about 
whether these properties are instantiated or not. 

It turns out that there is only one way to assign (generalized) probabilities to quan- 
tum properties, i.e., weights that satisfy the usual Kolmogorov axioms for a probability 
measure on Boolean sublattices of the non-Boolean lattice of quantum properties. This 
is the content of Gleason's theorem |Gle ason, 1957| . For a quantum state p, a property 
p represented by a projection operator P is assigned the probability Ti(pP). If p is a 
pure state p — \ip)(i/j\, the probability of p is | (ip p \ip)\ 2 , where | tp p ) is the orthogonal 
projection of \tp) onto the subspace P, i.e., the probability of p is the square of the 
cosine of the angle between the ray and the subspace P. This means that properties 
represented by subspaces containing the state are assigned probability 1, properties 
represented by subspaces orthogonal to the state are assigned probability 0, and all 
other properties, represented by subspaces at a non-zero or non-orthogonal angle to the 
state are assigned a probability between and 1. So quantum probabilities are not rep- 
resented as measures over truth-value assignments and cannot be given an ignorance 
interpretation in the obvious way. 

The question now is: what do these 'angle probabilities' or, perhaps better, 'angle 
weights' mean? The orthodox answer is that the probability assigned to a property of a 
system by a quantum state is to be understood as the probability of finding the property 
in a measurement process designed to ascertain whether or not that property obtains. 
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A little thought will reveal that this proposal is very problemàtic. When the system 
is represented by a quantum state that assigns a certain property the probability 1/2, 
say, this property is indeterminate. Physicists would say that ascribing the property to 
the system in that state is 'meaningless.' But somehow it makes sense to design an 
experiment to ascertain whether or not the property is instantiated by the system. And 
in such a measurement, the probability is asserted to be 1/2 that the experiment will 
yield the answer 'yes,' and 1/2 that the experiment will yield the answer 'no.' Clearly, a 
measurement process in quantum mechanics is not simply a procedure for ascertaining 
whether or not a property is instantiated in any straightforward sense. Somehow, a 
measurement process enables an indeterminate property, that is neither instantiated nor 
not instantiated by a system in a given quantum state, to either instantiate itself or not 
with a certain probability; or equivalently, a proposition that is neither true nor false 
can become true or false with a certain probability in a suitable measurement process. 

The probability problem (as opposed to the truth problem, Schròdinger's problem) 
is the problem of interpreting the 'angle weights' as probabilities in some sense (rel- 
ative freqüències? propensities? subjective Bayesian betting probabilities?) that does 
not reduce to a purely instrumentalist interpretation of quantum mechanics, accord- 
ing to which the theory is simply regarded as a remarkably accurate instrument for 
prediction. (Recali Einstein's remark about quantum mechanics being of interest only 
to shopkeepers and engineers on the Copenhagen interpretation.) The problem arises 
because of the unique way in which probabilities can be introduced in quantum me- 
chanics, and because the notion of measurement or observation is utterly mysterious 
on the Copenhagen interpretation. 

In classical theories, we measure to find out what we don 't know, but in principle a 
measurement does not change what is (and even if it does change what is, this is simply 
a change or disturbance from one state of being to another that can be derived on the 
basis of the classical theory itself). In quantum mechanics, measurements apparently 
bring into being something that was indeterminate, not merely unknown, before, i.e., a 
proposition that was neither true nor false becomes true in a measurement process, and 
the way in which this happens according to the theory is puzzling, given our deepest 
assumptions about objectivity, change, and intervention. 

Now, we know how to solve Schròdinger's problem, i.e., we know all the possible 
ways of modifying quantum mechanics to solve this problem. The problem arises be- 
cause of the linear dynamics of the theory, which yields a certain entangled state as the 
outcome of a measurement interaction, and the interpretation of this entangled state as 
representing a state of affairs that makes certain propositions true, certain propositions 
false, and other propositions indeterminate. Either we change the linear dynamics in 
some way, or we keep the linear dynamics and say something non-orthodox about the 
relation between truth and indeterminateness and the quantum state. Both options have 
been explored in various ways and in great detail: we understand the solution spacefor 
Schròdinger's problem, and the consequences of adopting a particular solution. 

'Collapse' theories, like the theory developed by Ghirardi, Rimini, and Weber 
(GRW), and extended by Pearle | Ghirardi, 2002 1, solve the problem by modifying the 
linear dynamics of quantum mechanics. (See Dickson, this vol., ch. 4, for an account.) 
In the modified theory, there is a certain very small probability that the wavefunction of 
a particle (the function defined by the components of the quantum state with respect to 
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the position basis in Hilbert space) will spontaneously 'collapse' after being multiplied 
by a peaked Gaussian of a specified width. For a macroscopic system consisting of 
many partides, this probability can be close to 1 for very short time intervals. In effect, 
this collapse solution modifies the linear dynamics of Standard quantum mechanics by 
adding uncontrollable noise. When the stochastic terms of the modified dynamics be- 
come important at the mesoscopic and macroscopic levels, they tend to localize the 
wave function in space. So measurement interactions involving macroscopic pieces of 
equipment (or cats) can be distinguished from elementary quantum processes, insofar 
as they lead to the almost instantaneous collapse of the wave function and the correla- 
tion of the measured observable with the position of a localized macroscopic pointer 
observable. 

'No collapse' solutions are constrained by certain 'no go' theorems that restrict 
the assignment of properties, or vàlues to observables, under very general assump- 
tions about the àlgebra of observables ]Kochen and Specker, 1967| , or restrict the as- 
signment of vàlues to observables under certain assumptions about how distributions 
of vàlues are related to quantum probabilities [Bell, 1964 1. A theorem by Bub and 
Clifton [1996| shows that if you assume that the set of definite-valued observables 
has a certain structure (essentially allowing quantum probabilities to be recovered as 
classical measures over distributions defined by different possible sets of vàlues or 
properties), and the pointer observable in a measurement process belongs to the set 
of definite-valued observables, then the class of such theories — so-called 'modal in- 
terpretations' — is uniquely specified. More precisely, the sublattice associated with 
any single observable R is a Boolean lattice, B, and a quantum state \tp) defines a 
classical probability measure on B, in the sense that all the single and joint probabili- 
ties assigned by \ip) to elements in B can be recovered as measures on a Kolmogorov 
probability space defined on the 'phase space' X of 2-valued homomorphisms on B. 
The Bub-Clifton theorem characterizes the maximal lattice extension, C, of any such 
Boolean sublattice associated with an observable R and a given quantum state un- 
der the assumption that C is an ortholattice, 24 invariant under lattice automorphisms 
that preserve R and \ip), for which the probabilities assigned by \ip) to elements in 
C can be similarly recovered as measures on a Kolmogorov probability space defined 
on the 'phase space' Y of 2-valued homomorphisms on C. In this sense, the theorem 
characterizes the limits of classicality in a quantum propositional structure. It turns 
out that different modal interpretations can be associated with different 'determinate 
sublattices' C, i.e., with different choices of a 'preferred observable' R. For Standard 
quantum mechanics, R is the identity, and the determinate sublattice C consists of all 
quantum propositions represented by subspaces containing the state \ip) (propositions 
assigned probability 1 by 1^)) and subspaces orthogonal to \tp) (propositions assigned 
probability by \ip), Bohm's hidden variable theory can be regarded as a modal in- 
terpretation in which the preferred observable is position in configuration space. (See 
Dickson, this vol., ch. 4, and |Goldstein, 2001 1 for an account of Bohm's theory.) 

An alternative type of 'no-collapse' solution to the Schròdinger problem is pro- 
vided by the Everett interpretation |Eve rett, 1957| . (See Dickson, this vol., ch. 4, for 
an account.) There are a variety of Everettian interpretations in the literature, the com- 

24 I.e., an orthogonal complement exists for every element of C. 



87 



mon theme being that all possible outcomes of a measurement are regarded as actual 
in some indexical sense, relative to different terms in the global entangled state (with 
respect to a certain preferred basis in Hilbert space), which are understood to be as- 
sociated with different worlds or different minds, depending on the version. The most 
sophisticated formulation of Everett's interpretation is probably the Saunders-Wallace 
version ISaunde rs, 1998||Wallace, 20031 . Here the preferred basis is selected by deco- 
herence (see below), and probabilities are introduced as rational betting probabilities in 
the Bayesian sense via a decision-theoretic argument originally due to Deutsch 1 1999 1. 

To sum up: any solution to Schròdinger's measurement problem involves either 
modifying the linear dynamics of the theory ('collapse' theories), or taking some ob- 
servable in addition to the identity as having a determinate value in every quantum 
state, and modifying what the Standard theory says about what propositions are tme, 
false, and indeterminate in a quantum state (modal interpretations, 'no collapse' hidden 
variable theories), so that at the end of a measurement interaction that correlates macro- 
scopic pointer positions with possible vàlues of a measured observable, the pointer 
propositions and propositions referring to measured vàlues end up having determinate 
truth vàlues. Alternatively (Everettian interpretations), we can intèrpret quantum me- 
chanics so that every measurement outcome becomes determinate in some indexical 
sense (with respect to different worlds, or different minds, or different branches of the 
entangled state, etc). 

We know in considerable detail what these solutions look like, in terms of how 
quantum mechanics is modified. It was a useful project to explore these solutions, be- 
cause we learnt something about quantum mechanics in the process, and perhaps there 
is more to learn by exploring the solution space further. But the point to note here 
is that all these solutions to the 'truth problem' of measurement distort quantum me- 
chanics in various ways by introducing additional structural features that obscure rather 
than illuminate our understanding of the phenomena involved in information-theoretic 
applications of entanglement, such as quantum teleportation, the possibility and im- 
possibility of certain quantum cryptographic protocols relative to classical protocols, 
the exponential speed-up of quantum computation algorithms relative to classical algo- 
rithms, and so on. 

Consider again the Bohr-Einstein dispute about the interpretation of quantum me- 
chanics. One might say that what separated Einstein (and Schròdinger) and Bohr was 
their very different answers to what van Fraassen 1 1991 p. 4] has called 'the founda- 
tional question par excellence: how could the world possibly be the way quantum the- 
ory says it is? This would be misleading. Einstein answered this question by arguing 
that the world couldn't be the way quantum theory says it is, unless the theory is not the 
whole story (so a 'completion' of the theory — perhaps Einstein's sought-after unified 
field theory — would presumably answer the question). But Bohr's complementarity 
interpretation is not intended to be an answer to this question. Rather, complementar- 
ity should be understood as suggesting an answer to a different question: why must the 
world be the way quantum theory says it is? 

To bring out the difference between these two qüestions, consider Einstein's dis- 
tinction between what he called 'principle' versus 'constructive' theories. Einstein 
introduced this distinction in an article on the significance of the special and general 
theories of relativity that he wrote for the London Times, which appeared in the issue 
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ofNovember28, 1919 |1919|: 



We can distinguish various kinds of theories in physics. Most of them 
are constructive. They attempt to build up a picture of the more complex 
phenomena out of the material of a relatively simple formal scheme from 
which they start out. Thus the kinetic theory of gases seeks to reduce me- 
chanical, thermal, and diffusional processes to movements of molècules — 
i.e., to build them up out of the hypothesis of molecular motion. When we 
say that we have succeeded in understanding a group of natural processes, 
we invariably mean that a constructive theory has been found which covers 
the processes in question. 

Along with this most important class of theories there exists a second, 
which I will call 'principle theories.' These employ the analytic, not the 
synthetic, method. The elements which form their basis and starting-point 
are not hypothetically constructedbut empirically discovered ones, general 
characteristics of natural processes, principies that give rise to mathemat- 
ically formulated criteria which the separate processes or the theoretical 
representations of them have to satisfy. Thus the science of thermody- 
namics seeks by analytical means to deduce necessary conditions, which 
separate events have to satisfy, from the universally experienced fact that 
perpetual motion is impossible. 

Einstein's point was that relativity theory is to be understood as a principle theory. 
He returns to this theme in his 'Autobiographical Notes' 119491 pp. 51-52], where he 
remarks that he first tried to find a constructive theory that would account for the known 
properties of mater and radiation, but eventually became convinced that the solution to 
the problem was to be found in a principle theory that reconciled the constancy of the 
velocity of light in vacuo for all inertial frames of reference, and the equivalence of 
inertial frames for all physical laws (mechanical as well as electromagnètic): 

Refiections of this type made it clear to me as long ago as shortly after 
1900, i.e., shortly after Planck's trailblazing work, that neither mechanics 
nor electrodynamics could (except in limiting cases) claim exact validity. 
By and by I despaired of the possibility of discovering the true laws by 
means of constructive efforts based on known facts. The longer and the 
more despairingly I tried, the more I came to the conviction that only the 
discovery of a universal formal principle could lead us to assured results. 
The example I saw before me was thermodynamics. The general princi- 
ple was there given in the theorem: the laws of nature are such that it is 
impossible to construct a perpetuum mobile (of the first and second kind). 
How, then, could such a universal principle be found? 

A little later ÍÏ9491 p. 57], he adds: 

The universal principle of the special theory of relativity is contained in the 
postulate: The laws of physics are invariant with respect to the Lorentz- 
transformations (for the transition from one inertial system to any other 
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arbitrarily chosen system of inertia). This is a restricting principle for 
natural laws, comparable to the restricting principle for the non-existence 
of the perpetuum mobile which underlies thermodyamics. 

According to Einstein, two very different sorts of theories should be distinguished 
in physics. One sort involves the reduction of a domain of relati vely complex phenom- 
ena to the properties of simpler elements, as in the kinetic theory, which reduces the 
mechanical and thermal behavior of gases to the motion of molècules, the elementary 
building blocks of the constructive theory. The other sort of theory is formulated in 
terms of 'no go' principies that impose constraints on physical processes or events, 
as in thermodynamics ('no perpetual motion machines'). For an illuminating account 
of the role played by this distinction in Einstein's work, see the discussion by Martin 
Klein in ÍT957I . 

The special theory of relativity is a principle theory, formulated in terms of two 
principies: the equivalence of inertial frames for all physical laws (the laws of electro- 
magnètic phenomena as well as the laws of mechanics), and the constancy of the ve- 
locity of light in vacuo for all inertial frames. These principies are irreconcilable in the 
geometry of Newtonian space-time, where inertial frames are related by Galilean trans- 
formations. The required revision yields Minkowski geometry, where inertial frames 
are related by Lorentz transformations. Einstein characterizes the special principle of 
relativity, that the laws of physics are invariant with respect to Lorentz transformations 
from one inertial system to another, as 'a restricting principle for natural laws, compa- 
rable to the restricting principle for the non-existence of the perpetuum mobile which 
underlies thermodynamics.' (In the case of the general theory of relativity, the group 
of allowable transformations includes all differentiable transformations of the space- 
time manifold onto itself.) By contrast, the Lorentz theory |Lorentz,"Ï9 09|, which 
derives the Lorentz transformation from the electromagnètic properties of the aether, 
and assumptions about the transmission of molecular forces through the aether, is a 
constructive theory. 

The question: 

How could the world possibly be the way the quantum theory says it is? 

is motivated by a difficulty in interpreting quantum mechanics as a constructive theory, 
and the appropriate response is some constructive repair to the theory that resolves the 
difficulty, or the demonstration that the puzzling features of quantum mechanics at the 
phenomenal level (the phenomena of interference and entanglement) can be derived 
from a physically unproblematic constructive theory. 
The question: 

Why must the world be the way the quantum theory says it is? 

does not ask for a 'bottom-up' explanation of quantum phenomena in terms of a phys- 
ical ontology and dynamical laws. Rather, the question concerns a 'top-down' deriva- 
tion of quantum mechanics as a principle theory, in terms of operational constraints on 
the possibilities of manipulating phenomena. In the case of quantum mechanics, the 
relevant phenomena concern information. 
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This shift in perspective between the two qüestions is highlighted in a remark by 
Andrew Steane in his review article on 'Quantum Computing' 1 1998 p.l 19]: 

Historically, much of fundamental physics has been concerned with dis- 
covering the fundamental partides of nature and the equations which de- 
scribe their motions and interactions. It now appears that a different pro- 
gramme may be equally important: to discover the ways that nature al- 
lows, and prevents, Information to be expressed and manipulated, rather 
than partides to move. 

Steane concludes his review with the following proposal 1 1998 p. 171]: 

To conclude with, I would like to propose a more wide-ranging theoretical 
task: to arrive at a set of principies like energy and momentum conserva- 
tion, but which apply to information, and from which much of quantum 
mechanics could be derived. Two tests of such ideas would be whether the 
EPR-Bell correlations thus became transparent, and whether they rendered 
obvious the proper use of terms such as 'measurement' and 'knowledge.' 

A similar shift in perspective is implicit in Wheeler's question 'Why the quantum?,' 
one of Wheeler's 'Really Big Qüestions' |1998|. Steane's suggestion is to answer 
the question by showing how quantum mechanics can be derived from information- 
theoretic principies. A more specific proposal along these lines originates with Gilles 
Brassard and Chris Fuchs. As remarked in £15.2.11 Brassard and Fuchs | Brassard, 2000 
Fuchs, 1997 Fuchs, 2000 Fuchs and Jacobs, 2002 1 speculated that quantum mechan- 
ics could be derived from information-theoretic constraints formulated in terms of cer- 
tain primitive cryptographic protocols: specifically, the possibility of unconditionally 
secure key distribution, and the impossibility of unconditionally secure bit commit- 
ment. 

The CBH theorem (motivated by the Brassard-Fuchs conjecture) shows that quan- 
tum mechanics can be regarded as a principle theory in Einstein's sense, where the 
principies are information-theoretic constraints. So we have an answer to the question: 
why must the world be the way quantum mechanics says it is? The phenomena of in- 
terference and nonlocal entanglement are bound to occur in a world in which there are 
certain constraints on the acquisition, communication, and processing of information. 

Consider, for comparison, relativity theory, the other pillar of modern physics. A 
relativistic theory is a theory with certain symmetry or invariance properties, defined 
in terms of a group of space-time transformations. Following Einstein's formulation 
of special relativity as a principle theory, we understand this invariance to be a con- 
sequence of the fact that we live in a world in which natural processes are subject to 
certain constraints: roughly (as Hermann Bondi 1 1980 1 puts it), 'no overtaking of light 
by light,' and 'velocity doesn't matter' (for electromagnètic as well as mechanical phe- 
nomena). Recali Einstein's characterization of the special principle of relativity as 'a 
restricting principle for natural laws, comparable to the restricting principle of the non- 
existence of the perpetuum mobile which underlies thermodynamics.') Without Ein- 
stein's analysis, the transformations of Minkowski space-time would simply be a rather 
puzzling algorithm for relativistic kinematics and the Lorentz transformation, which is 
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incompatible with the kinematics of Newtonian space-time. What Einstein's analysis 
provides is a rationale for taking the structure of space and time as Minkowskian: we 
see that this is required for the consistency of the two principies of special relativity. 

A quantum theory is a theory in which the observables and states have a certain 
characteristic algebraic structure. Unlike relativity theory, quantum mechanics was 
born as a recipe or algorithm for caclulating the expectation vàlues of observables 
measured by macroscopic measuring instruments. A theory with a commutative C*- 
algebra has a phase space representation — not necessarily the phase space of classical 
mechanics, but a theory in which the observables of the C* -àlgebra are replaced by 
'beables' (Bell's term, see 1 1987 1), and the C*-algebraic states are replaced by beable- 
states representing complete lists of properties (idempotent quantities). In this case, it 
is possible to extend the theory to include the measuring instruments that are the source 
of the C* -algebraic statistics, so that they are no longer 'black boxes' but constructed 
out of systems that are characterized by properties and states of the phase space theory. 
That is, the C* -algebraic theory can be replaced by a 'detached observer' theory of 
the physical processes underlying the phenomena, to use Pauli's term [B orn, 197T| p. 
218], including the processes involved in the functioning of measuring instruments. 
Note that this depends on a representation theorem. In the noncommutative case, we 
are guaranteed only the existence of a Hilbert space representation of the C* -àlgebra, 
and it is an open question whether a 'detached observer' description of the phenomena 
is possible. 

Solving Schròdinger's problem — the truth problem — amounts to a proposal to treat 
quantum mechanics as a failed or incomplete constructive theory in need of construc- 
tive repair. In effect, the problem is how to account for quantum information — the 
puzzling features of interference and nonlocal entanglement — in a theoretical frame- 
work in which only classical information is meaningful in a fundamental sense. If we 
treat quantum mechanics as a principle theory of information, the core foundational 
problem is the probability problem. From this perspective, the problem is how to ac- 
count for the appearance of classical information in a quantum world characterized by 
information-theoretic constraints. 

One might complain that treating quantum mechanics as a principle theory amounts 
to simply postulating what is ultimately explained by a constructive theory like the 
GRW theory or Bohm's theory. This would amount to rejecting the idea that a principle 
theory can be explanatory. From the perspective adopted here, Bohm's constructive 
theory in relation to quantum mechanics is like Lorentz's constructive theory of the 
electron in relation to special relativity. Cushing 1 1998 p. 204] quotes Lorentz (from 
the conclusion of the 1916 edition of The Theory of Electrons) as complaining similarly 
that 'Einstein simply postulates what we have deduced.' 

I cannot speak here of the many highly interesting applications which 
Einstein has made of this principle [of relativity]. His results concerning 
electromagnètic and optical phenomena ... agree in the main with those 
which we have obtained in the preceding pagès, the chief difference being 
that Einstein simply postulates what we have deduced, with some diffi- 
culty and not altogether satisfactorily, from the fundamental equations of 
the electromagnètic field. By doing so, he may certainly take crèdit for 
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making us see in the negative result of experiments like those of Michel- 
son, Rayleigh and Brace, not a fortuitous compensation of opposing ef- 
fects, but the manifestation of a general and fundamental principle. 

Yet, I think, something may also be claimed in favour of the form in 
which I have presented the theory. I cannot but regard the aether, which 
can be the seat of an electromagnètic field with its energy and its vibra- 
tions, as endowed with a certain degree of substantiality, however differ- 
ent it may be from all ordinary matter. In this line of thought, it seems 
natural not to assume at starting that it can never make any difference 
whether a body moves through the aether or not, and to measure distances 
and lengths of time by means of rods and clocks having a fixed position 
relative to the aether. 

Note that Lorentz's theory is constrained by the principies of special relativity, 
which means that the aether as a rest frame for electromagnètic phenomena must, in 
principle, be undetectable. So such a theory can have no excess empirical content over 
special relativity. Cushing 1 1998, p. 193] also quotes Maxwell as asking whether 'it is 
not more philosophical to admit the existence of a medium which we cannot at present 
perceive, than to assert that a body can act at a place where it is not.' Yes, but not 
if we also have to admit that, in principle, as a matter of physical law, if we live in a 
world in which events are constrained by the two relativistic principies, the medium 
must remain undetectable. 

You can, if you like, teli a constructive story about quantum phenomena, but such 
an account, if constrained by the information-theoretic principies, will have no ex- 
cess empirical content over quantum mechanics. Putting this differently, a solution to 
Schròdinger's truth problem that has excess empirical content over quantum mechan- 
ics must violate one or more of the CBH information-theoretic constraints. So, e.g., 
a Bohmian theory of quantum phenomena is like an aether theory for electromagnètic 
fields. Just as the aether theory attempts to make sense of the behaviour of fields by 
proposing an aether that is a sort of sui generis mechanical system different from all 
other mechanical systems, so Bohm's theory attempts to make sense of quantum phe- 
nomena by introducing a field (the quantum potential or guiding field) that is a sort of 
sui generis field different from other physical fields. 

The crucial distinction here is between a constructive theory formulated in terms 
of a physical ontology and dynamical laws ('bottom-up') and a principle theory for- 
mulated in terms of operational constraints at the phenomenal level ('top-down'). A 
constructive theory introduces an àlgebra of beables and beable-states. A principle the- 
ory introduces an àlgebra of observables and observable-states, which are essentially 
probability measures. 

It seems clear that the àlgebra of observables will be non-trivially distinct from the 
àlgebra of beables if cloning is impossible. For if a constructive theory for a certain 
domain of phenomena allows dynamical interactions in which a beable of one sys- 
tem, designated as the measuring instrument, can become correlated with a beable of 
another system, designated as the measured system, without disturbing the vàlues of 
other beables of the measured system, we can take such an interaction as identifying 
the value of the beable in question (in the sense that the value of a beable of one system 
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is recorded in the value of a beable of a second system). If this is possible, then it will 
be possible to simultaneously measure any number of beables of a system by concate- 
nating measurement interactions, and so it will be possible in principle to identify any 
arbitrary beable state. If we assume that we can prepare any state, then the possibility 
of identifying an arbitrary state means that we can construct a device that could copy 
any arbitrary state. So if we cannot construct such a device, then measurement in this 
sense must also be impossible. It follows that a 'measurement' in the constructive the- 
ory will be something other than the mere identification of a beable value of a system, 
without disturbance, and the question of what the observables are in such a theory will 
require a non-trivial analysis. 

Such an analysis is indeed given by Bohm in Part II of his two-part 1952 paper 
on hidden variables 1 1952|, and a more careful and sophisticated analysis is given 
by |D ürr et al., 2003| for their 'Bohmian mechanics' version of Bohm's theory. As 
one would expect (given the equilibrium distribution assumption which ensures that 
Bohm's theory is empirically indistinguishable from quantum mechanics), while the 
beables are functions of position in configuration space (and form a commutative àlge- 
bra), the observables of the theory are just the observables of quantum theory and form 
a noncommutative àlgebra. 

The CBH theorem assumes that, for the theories we are concerned with, the ob- 
servables form a C* -àlgebra. The content of the CBH theorem is that, given cer- 
tain information-theoretic constraints, the C* -àlgebra of observables and observable 
states takes a certain form characteristic of quantum theories. The theorem says noth- 
ing about beables and beable-states, and does not address the measurement problem 
(Schròdinger's truth problem), let alone solve it. But from the perspective adopted 
here, the measurement problem is simply the observation that cloning is impossible, 
and a 'solution to the measurement problem' is the proposal of a physical ontology and 
dynamics and an analysis of measurement that yields the observables and observable- 
states of Standard quantum mechanics. Such theories provide possible explanations for 
the impossibility of cloning. But since there are now a variety of such explanations 
available, and — assuming the CBH information-theoretic principies — there are no em- 
pirical constraints, in principle, that could distinguish these explanations, there seems 
little point in pursuing the question further. A constructive theory whose sole moti- 
vation is to 'solve the measurement problem' seems unlikely to survive fundamental 
advances in physics driven by other theoretical or experimental problems 

The probability problem — the core foundational problem for the interpretation of 
quantum mechanics as a principle theory of information — can be put this way: From 
the information-theoretic constraints, we get a noncommutative (or non-Boolean) the- 
ory of correlations for which there is no phase space representation. One can define, in 
a unique way (according to Gleason's theorem) generalized 'transition probabilities' or 
'transition weights' associated with certain structural features of the noncommutative 
structure: the angles between geometrical elements representing quantum 'proposi- 
tions.' The problem is how to understand these weights as representing probabilities, 
without reducing the problem to a solution of the truth problem. 

It seems clear that we need to take account of the phenomenon of decoherence (see 
Landsmann, this vol., ch. 5; Dickson, this vol., ch. 4; |Zurek, 2003 Olliver et al, 2004 1): 
an extremely fast process that occurs in the spontaneous interaction between a macrosys- 
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tem and its environment that leads to the virtually instantaneous suppression of quan- 
tum interference. What happens, roughly, is that a macrosystem like Schròdinger's cat 
typically becomes correlated with the environment — an enormous number of stray dust 
partides, air molècules, photons, background radiation, etc. — in an entangled state that 
takes a certain form with respect to a preferred set of basis states, which remain stable 
as the interaction develops and includes more and more partides. It is as if the environ- 
ment is 'monitoring' the system via a measurement of properties associated with the 
preferred states, in such a way that information about these properties is stored redun- 
dantly in the environment. This stability, or robustness, of the preferred basis, and the 
redundancy of the information in the environment, allows one to identify certain emer- 
gent structures in the overall pattern of correlations — such as macroscopic pointers and 
cats and information-gatherers in general — as classical-like: the correlational informa- 
tion required to reveal quantum interference for these structures is effectively lost in 
the environment. So it appears that the information-theoretic constraints are consistent 
with both (i) the conditions for the existence of measuring instruments as sources of 
classical information, and (ii) the existence of information-gatherers with the ability to 
use measuring instruments to apply and test quantum mechanics, given a characteriza- 
tion of part of the overall system as the environment. That is, decoherence provides 
an explanation for the emergence of classical information in a quantum correlational 
structure. 

If something like the above account of decoherence is acceptable, then the prob- 
ability problem reduces to showing that the probabilities assigned to measurement 
outcomes by these information-gatherers, in the subjective Bayesian sense, are just 
the Gleason generalized transition probabilities. That is, we need to show that, while 
quantum theory, at the fundamental level, is a noncommutative theory of correlations 
for which there is no phase space representation, it is also a theory of the probabilistic 
behavior of information-gatherers, certain emergent structures in the pattern of correla- 
tions when correlational information in their environment is ignored. For an argument 
along these lines, see | |Pitowsky, 2002) . 

On the view proposed here, no measurement outcomes are certified as determi- 
nate by the theory. Rather, measuring instruments are sources of classical information, 
where the individual occurrence of a particular distinguishable event ('symbol') pro- 
duced stochastically by the information source lies outside the theory. In this sense, a 
measuring instrument, insofar as it functions as a classical information source, is still 
ultimately a 'black box' in the theory. So a quantum description will have to introduce 
a 'cut' between what we take to be the ultimate measuring instrument in a given mea- 
surement process and the quantum phenomenon revealed by the instrument. But this 
'cut' is no longer ad hoc, or mysterious, or in some other way problemàtic, as it is in 
the Copenhagen interpretation (see Landsmann, this vol., ch. 5). For here the 'cut' 
just reflects the fundamental interpretative claim: that quantum mechanics is a theory 
about the representation and manipulation of information constrained by the possibili- 
ties and impossibilities of information-transfer in our world, rather than a theory about 
the ways in which nonclassical waves and partides move. 
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